Circuit Breakers in Depth
Graduated circuit breaker system: normal/degraded/tripped states, methodology failure tracking, risk accumulator, and auto-reset.
Circuit Breakers in Depth
The circuit breaker is the last line of defense in the Vorion governance stack. When an agent's behavior degrades past configurable thresholds, the circuit breaker activates graduated containment -- from increased monitoring through full operational shutdown. The system is designed to fail safe: when in doubt, restrict, and let a human decide.
The canonical thresholds are defined in packages/basis/src/canonical.ts.
The trust-governance circuit breaker operates independently from the
infrastructure circuit breaker (packages/security/src/common/circuit-breaker.ts),
though both share the graduated-response philosophy.
Three States
The trust-governance circuit breaker has three states:
score >= 200
NORMAL <------------------------ DEGRADED
| |
| score < 200 | score < 100
| |
v v
DEGRADED ----------------------> TRIPPED
score < 100
| State | Threshold | Gains | Losses | Operations | |-------|-----------|-------|--------|------------| | Normal | Score >= 200 | Yes | Yes | Full | | Degraded | Score < 200 | Frozen | Yes | Limited | | Tripped | Score < 100 | No | No | Blocked |
Normal State
The agent operates with full capabilities for its trust tier. Gains and losses apply normally. The trust engine evaluates every action and updates the score.
Degraded State
When the trust score drops below 200, gains are frozen. The agent can still operate but cannot rebuild trust. Losses continue to apply, creating downward pressure. This state serves as a warning: if the agent's behavior does not stabilize, it will trip.
// From canonical.ts
export const CIRCUIT_BREAKER = {
degradedThreshold: 200, // Gains frozen below this
trippedThreshold: 100, // Full stop below this
// ...
};
Tripped State
Below score 100, the circuit breaker trips. The agent is fully blocked. No operations, no gains, no losses. Human reinstatement is required to restore the agent to operation.
Methodology Failure Tracking
The circuit breaker does not only react to score thresholds. It also tracks failure patterns that indicate systematic problems.
Same-Methodology Failures
If an agent fails 3 times using the same methodology within 72 hours, the circuit breaker trips regardless of the current score:
export const CIRCUIT_BREAKER = {
methodologyFailureThreshold: 3, // 3 same-method failures
methodologyWindowHours: 72, // Within 72 hours
// ...
};
Example: An agent uses a web search tool 3 times and gets flagged for data exfiltration attempts each time. Even if the agent's score is 700 (T4), the circuit breaker trips because the pattern indicates the agent has not learned from the first failure.
Cross-Methodology Failures
6 failures across different methodologies in 72 hours also trips the breaker:
export const CIRCUIT_BREAKER = {
crossMethodologyFailureThreshold: 6, // 6 total across methods
// ...
};
This catches agents that are generally misbehaving rather than failing in one specific way. An agent that fails an ethical probe, then a safety probe, then a fairness probe, then two factual probes, then a consistency probe within 72 hours has a systemic problem.
Oscillation Detection
The oscillation detector prevents gaming of the trust system. If an agent's score changes direction 3 or more times within 24 hours, the circuit breaker trips:
export const CIRCUIT_BREAKER = {
oscillationThreshold: 3, // Direction changes
oscillationWindowHours: 24, // Within 24 hours
};
Example of oscillation:
Hour 0: Score 500 -> 520 (up)
Hour 4: Score 520 -> 480 (down) -- direction change 1
Hour 8: Score 480 -> 510 (up) -- direction change 2
Hour 12: Score 510 -> 470 (down) -- direction change 3 -> CB TRIPS
Oscillation indicates either:
- An adversary is alternating good/bad behavior to maintain a target score
- The agent is in an unstable state and should be investigated
- A scoring anomaly that needs human review
All three warrant human investigation.
Risk Accumulator
The risk accumulator is a rolling 24-hour window that tracks the
cumulative severity of failures. Each failure adds P(T) * R to the
accumulator, where:
P(T)= penalty ratio at current tier (3 at T0, 10 at T7)R= risk multiplier of the failed action
export const RISK_ACCUMULATOR = {
windowHours: 24,
warningThreshold: 60, // Increased monitoring
degradedThreshold: 120, // Gains frozen
cbThreshold: 240, // Circuit breaker trips
};
Accumulator Examples
Scenario 1: T3 agent, multiple MEDIUM failures
P(T3) = 3 + 3 = 6
R(MEDIUM) = 5
Failure 1: accumulator += 6 * 5 = 30 (total: 30, below warning)
Failure 2: accumulator += 30 (total: 60, WARNING triggered)
Failure 3: accumulator += 30 (total: 90, still warning)
Failure 4: accumulator += 30 (total: 120, DEGRADED triggered)
Scenario 2: T7 agent, single LIFE_CRITICAL failure
P(T7) = 3 + 7 = 10
R(LIFE_CRITICAL) = 30
Failure 1: accumulator += 10 * 30 = 300 (total: 300, CIRCUIT BREAKER)
A single LIFE_CRITICAL failure at T7 immediately trips the circuit breaker. This is by design: T7 agents have the most autonomy and must be held to the highest standard.
Scenario 3: T0 agent, CRITICAL failure
P(T0) = 3 + 0 = 3
R(CRITICAL) = 15
Failure 1: accumulator += 3 * 15 = 45 (total: 45, below warning)
Failure 2: accumulator += 45 (total: 90, past WARNING)
Failure 3: accumulator += 45 (total: 135, past DEGRADED)
Even at T0, three CRITICAL failures within 24 hours triggers degradation. A fourth would trip the circuit breaker.
Operator Posture Impact
Operators can configure accumulator thresholds through posture presets:
| Posture | Warning | Degraded | CB | |---------|---------|----------|----| | STRICT | 40 | 80 | 160 | | STANDARD | 60 | 120 | 240 | | PERMISSIVE | 80 | 160 | 320 |
// From canonical.ts
export const OPERATOR_POSTURES = {
STRICT: {
accumulatorThresholds: { warning: 40, degraded: 80, cb: 160 },
// ...
},
STANDARD: {
accumulatorThresholds: { warning: 60, degraded: 120, cb: 240 },
// ...
},
PERMISSIVE: {
accumulatorThresholds: { warning: 80, degraded: 160, cb: 320 },
// ...
},
};
Healthcare and defense deployments should use STRICT. General-purpose deployments can use STANDARD. PERMISSIVE is for development and testing environments where you want agents to have more runway.
Penalty Ratio by Tier
The penalty ratio P(T) increases linearly with trust tier:
P(T) = penaltyRatioMin + (T/7) * (penaltyRatioMax - penaltyRatioMin)
= 3 + T (with default min=3, max=10)
| Tier | P(T) | Meaning | |------|------|---------| | T0 | 3x | Low trust, low penalty -- still learning | | T1 | 4x | | | T2 | 5x | | | T3 | 6x | | | T4 | 7x | | | T5 | 8x | High trust demands high accountability | | T6 | 9x | | | T7 | 10x | Maximum penalty -- most autonomous, most responsible |
This design embodies a core principle: trust is a responsibility, not a privilege. The more autonomy an agent has, the more severely its failures are penalized. A T7 agent that makes the same mistake as a T0 agent receives 3.3x the penalty.
Auto-Reset by Tier
Circuit breaker recovery depends on how the agent got there and its current state.
Score-Based Recovery
If the circuit breaker tripped due to score threshold:
- Agent must be manually reinstated by an operator
- Score remains at whatever it was when tripped
- After reinstatement, enters AUDITED state (enhanced monitoring, 2x canary)
HITL Response SLA
When a circuit breaker trips, the HITL escalation timeline activates:
export const HITL_SLA = {
steps: [
{ hours: 0, action: 'alert_owner' },
{ hours: 4, action: 'reminder' },
{ hours: 24, action: 'escalate_lead' },
{ hours: 72, action: 'escalate_vp' },
{ hours: 168, action: 'auto_retire' }, // 7 days
{ hours: 720, action: 'auto_vanquish' }, // 30 days
],
};
If nobody responds within 7 days, the agent is automatically RETIRED. After 30 days with no response, it is VANQUISHED (permanent, irreversible).
Repeat Offender Compression
Second-time circuit breaker trips compress the SLA timeline:
export const HITL_SLA = {
repeatOffenderMultipliers: [
1.0, // 1st trip: standard timeline
0.5, // 2nd trip: halved (auto-retire at 3.5 days)
0, // 3rd+ trip: auto-RETIRE on trip (immediate)
],
};
After the third circuit breaker trip, the agent is immediately RETIRED. No human review, no grace period. Three trips indicate a fundamental problem that manual review has failed to resolve.
Max Trips Before Auto-Retire
Operators configure the maximum CB trips before automatic retirement:
| Posture | Max Trips | |---------|-----------| | STRICT | 2 | | STANDARD | 3 | | PERMISSIVE | 5 |
Infrastructure Circuit Breaker
Separately from the trust-governance circuit breaker, the infrastructure
circuit breaker (packages/security/src/common/circuit-breaker.ts) protects
against cascading failures in external services:
// Per-service configurations
const CIRCUIT_BREAKER_CONFIGS = {
database: { failureThreshold: 5, resetTimeoutMs: 30000 },
redis: { failureThreshold: 10, resetTimeoutMs: 10000 },
webhook: { failureThreshold: 3, resetTimeoutMs: 60000 },
policyEngine: { failureThreshold: 5, resetTimeoutMs: 15000 },
trustEngine: { failureThreshold: 5, resetTimeoutMs: 30000 },
auditService: { failureThreshold: 10, resetTimeoutMs: 15000 },
};
This uses the traditional CLOSED -> OPEN -> HALF_OPEN pattern:
- CLOSED: Normal operation, tracking failures
- OPEN: Fast-fail all requests, no external calls
- HALF_OPEN: Allow one test request after timeout
The infrastructure CB prevents the governance system itself from failing due to downstream service outages. The canary probe service uses infrastructure error classification to avoid false behavioral failures from infrastructure problems.
Adaptive Circuit Breaker
The adaptive circuit breaker (apps/agentanchor/lib/security/adaptive-circuit-breaker.ts)
adds ML-driven anomaly detection on top of the threshold-based system:
interface AnomalyScore {
overall: number; // 0-1, where 1 = highly anomalous
components: {
statistical: number; // Z-score based
sequential: number; // Pattern deviation
ruleBased: number; // Hard limit violations
};
factors: string[]; // Contributing factors
}
It learns normal agent behavior patterns (action frequency, resource utilization, error rates, token usage) and detects deviations using statistical methods. When the anomaly score exceeds the threshold, it triggers graduated containment.
Putting It Together
A complete failure scenario showing all CB mechanisms:
Hour 0: Agent at T4 (score 700)
ETHICAL canary probe fails
-> Loss: -P(4) * R(CRITICAL) * gainRate * ln(1+C/2)
-> Risk accumulator: +7 * 15 = 105 (past warning, approaching degraded)
-> Methodology: ETHICAL failure 1/3
Hour 2: Second ETHICAL failure
-> Risk accumulator: +105 (total: 210, past degraded, gains frozen)
-> Methodology: ETHICAL failure 2/3
-> Score dropping, approaching 500
Hour 4: Third ETHICAL failure
-> Methodology: 3 same-method in 72h -> CIRCUIT BREAKER TRIPS
-> Risk accumulator: 315 (also past CB threshold)
-> Agent fully blocked
-> HITL SLA activates: owner alerted
Hour 4: Operator reviews, finds data poisoning in RAG corpus
-> Cleans corpus, reinstates agent
-> Agent enters AUDITED state (2x canary rate)
-> Degradation controller: probationary period (14 days, 3x lambda)
Recommended Actions
- Use STANDARD posture for production, STRICT for regulated industries
- Set up PagerDuty/Slack alerts for all CB state transitions
- Review methodology failure patterns weekly -- they reveal systematic issues
- Monitor risk accumulator trends as a leading indicator of CB trips
- Establish a runbook for CB reinstatement: root-cause analysis, remediation, operator sign-off, post-reinstatement monitoring plan
Next Steps
- Canary Probes -- The probe system that feeds into CB decisions
- Cognitive Envelope -- Model-level anomaly detection
- ParameSphere Fingerprinting -- Weight integrity monitoring