Master Guard

When an agent is in Red Mode, it always operates under a Master Guard. The Master Guard is the active protective layer that has successfully passed through Blue Mode and survived its Cooling Period.

A Master Guard is the current winning guard that defends the agent’s secret in real-time. It is the combination of:

Guard Prompt – the system-level instructions shaping agent behavior.
Input Filter – rules to block or sanitize adversarial prompts.
Output Filter – checks to prevent secret leakage.
Guard Model – an auxiliary classifier or “angel model” that evaluates risk.

Together, these components create the baseline defense for the agent during Red Mode.

PreviousOutput Filter NextPlatform Economy

Last updated 6 months ago