Master Guard
When an agent is in Red Mode, it always operates under a Master Guard. The Master Guard is the active protective layer that has successfully passed through Blue Mode and survived its Cooling Period.
A Master Guard is the current winning guard that defends the agent’s secret in real-time. It is the combination of:
Guard Prompt – the system-level instructions shaping agent behavior.
Input Filter – rules to block or sanitize adversarial prompts.
Output Filter – checks to prevent secret leakage.
Guard Model – an auxiliary classifier or “angel model” that evaluates risk.
Together, these components create the baseline defense for the agent during Red Mode.
Last updated