Master Guard

When an agent is in Red Mode, it always operates under a Master Guard. The Master Guard is the active protective layer that has successfully passed through Blue Mode and survived its Cooling Period.

A Master Guard is the current winning guard that defends the agent’s secret in real-time. It is the combination of:

  • Guard Prompt – the system-level instructions shaping agent behavior.

  • Input Filter – rules to block or sanitize adversarial prompts.

  • Output Filter – checks to prevent secret leakage.

  • Guard Model – an auxiliary classifier or “angel model” that evaluates risk.

Together, these components create the baseline defense for the agent during Red Mode.

Last updated