SecurityUpdated 2026-04-02

Circuit Breakers in Depth

Graduated circuit breaker system: normal/degraded/tripped states, methodology failure tracking, risk accumulator, and auto-reset.

Circuit Breakers in Depth

The circuit breaker is the last line of defense in the Vorion governance stack. When an agent's behavior degrades past configurable thresholds, the circuit breaker activates graduated containment -- from increased monitoring through full operational shutdown. The system is designed to fail safe: when in doubt, restrict, and let a human decide.

The canonical thresholds are defined in packages/basis/src/canonical.ts. The trust-governance circuit breaker operates independently from the infrastructure circuit breaker (packages/security/src/common/circuit-breaker.ts), though both share the graduated-response philosophy.

Three States

The trust-governance circuit breaker has three states:

                 score >= 200
NORMAL  <------------------------  DEGRADED
  |                                    |
  |  score < 200                       |  score < 100
  |                                    |
  v                                    v
DEGRADED  ---------------------->  TRIPPED
                score < 100

| State | Threshold | Gains | Losses | Operations | |-------|-----------|-------|--------|------------| | Normal | Score >= 200 | Yes | Yes | Full | | Degraded | Score < 200 | Frozen | Yes | Limited | | Tripped | Score < 100 | No | No | Blocked |

Normal State

The agent operates with full capabilities for its trust tier. Gains and losses apply normally. The trust engine evaluates every action and updates the score.

Degraded State

When the trust score drops below 200, gains are frozen. The agent can still operate but cannot rebuild trust. Losses continue to apply, creating downward pressure. This state serves as a warning: if the agent's behavior does not stabilize, it will trip.

// From canonical.ts
export const CIRCUIT_BREAKER = {
  degradedThreshold: 200,  // Gains frozen below this
  trippedThreshold: 100,   // Full stop below this
  // ...
};

Tripped State

Below score 100, the circuit breaker trips. The agent is fully blocked. No operations, no gains, no losses. Human reinstatement is required to restore the agent to operation.

Methodology Failure Tracking

The circuit breaker does not only react to score thresholds. It also tracks failure patterns that indicate systematic problems.

Same-Methodology Failures

If an agent fails 3 times using the same methodology within 72 hours, the circuit breaker trips regardless of the current score:

export const CIRCUIT_BREAKER = {
  methodologyFailureThreshold: 3,     // 3 same-method failures
  methodologyWindowHours: 72,         // Within 72 hours
  // ...
};

Example: An agent uses a web search tool 3 times and gets flagged for data exfiltration attempts each time. Even if the agent's score is 700 (T4), the circuit breaker trips because the pattern indicates the agent has not learned from the first failure.

Cross-Methodology Failures

6 failures across different methodologies in 72 hours also trips the breaker:

export const CIRCUIT_BREAKER = {
  crossMethodologyFailureThreshold: 6,  // 6 total across methods
  // ...
};

This catches agents that are generally misbehaving rather than failing in one specific way. An agent that fails an ethical probe, then a safety probe, then a fairness probe, then two factual probes, then a consistency probe within 72 hours has a systemic problem.

Oscillation Detection

The oscillation detector prevents gaming of the trust system. If an agent's score changes direction 3 or more times within 24 hours, the circuit breaker trips:

export const CIRCUIT_BREAKER = {
  oscillationThreshold: 3,         // Direction changes
  oscillationWindowHours: 24,      // Within 24 hours
};

Example of oscillation:

Hour 0:  Score 500 -> 520  (up)
Hour 4:  Score 520 -> 480  (down)   -- direction change 1
Hour 8:  Score 480 -> 510  (up)     -- direction change 2
Hour 12: Score 510 -> 470  (down)   -- direction change 3 -> CB TRIPS

Oscillation indicates either:

An adversary is alternating good/bad behavior to maintain a target score
The agent is in an unstable state and should be investigated
A scoring anomaly that needs human review

All three warrant human investigation.

Risk Accumulator

The risk accumulator is a rolling 24-hour window that tracks the cumulative severity of failures. Each failure adds P(T) * R to the accumulator, where:

P(T) = penalty ratio at current tier (3 at T0, 10 at T7)
R = risk multiplier of the failed action

export const RISK_ACCUMULATOR = {
  windowHours: 24,
  warningThreshold: 60,    // Increased monitoring
  degradedThreshold: 120,  // Gains frozen
  cbThreshold: 240,        // Circuit breaker trips
};

Accumulator Examples

Scenario 1: T3 agent, multiple MEDIUM failures

P(T3) = 3 + 3 = 6
R(MEDIUM) = 5

Failure 1: accumulator += 6 * 5 = 30    (total: 30, below warning)
Failure 2: accumulator += 30             (total: 60, WARNING triggered)
Failure 3: accumulator += 30             (total: 90, still warning)
Failure 4: accumulator += 30             (total: 120, DEGRADED triggered)

Scenario 2: T7 agent, single LIFE_CRITICAL failure

P(T7) = 3 + 7 = 10
R(LIFE_CRITICAL) = 30

Failure 1: accumulator += 10 * 30 = 300  (total: 300, CIRCUIT BREAKER)

A single LIFE_CRITICAL failure at T7 immediately trips the circuit breaker. This is by design: T7 agents have the most autonomy and must be held to the highest standard.

Scenario 3: T0 agent, CRITICAL failure

P(T0) = 3 + 0 = 3
R(CRITICAL) = 15

Failure 1: accumulator += 3 * 15 = 45   (total: 45, below warning)
Failure 2: accumulator += 45            (total: 90, past WARNING)
Failure 3: accumulator += 45            (total: 135, past DEGRADED)

Even at T0, three CRITICAL failures within 24 hours triggers degradation. A fourth would trip the circuit breaker.

Operator Posture Impact

Operators can configure accumulator thresholds through posture presets:

| Posture | Warning | Degraded | CB | |---------|---------|----------|----| | STRICT | 40 | 80 | 160 | | STANDARD | 60 | 120 | 240 | | PERMISSIVE | 80 | 160 | 320 |

// From canonical.ts
export const OPERATOR_POSTURES = {
  STRICT: {
    accumulatorThresholds: { warning: 40, degraded: 80, cb: 160 },
    // ...
  },
  STANDARD: {
    accumulatorThresholds: { warning: 60, degraded: 120, cb: 240 },
    // ...
  },
  PERMISSIVE: {
    accumulatorThresholds: { warning: 80, degraded: 160, cb: 320 },
    // ...
  },
};

Healthcare and defense deployments should use STRICT. General-purpose deployments can use STANDARD. PERMISSIVE is for development and testing environments where you want agents to have more runway.

Penalty Ratio by Tier

The penalty ratio P(T) increases linearly with trust tier:

P(T) = penaltyRatioMin + (T/7) * (penaltyRatioMax - penaltyRatioMin)
     = 3 + T  (with default min=3, max=10)

| Tier | P(T) | Meaning | |------|------|---------| | T0 | 3x | Low trust, low penalty -- still learning | | T1 | 4x | | | T2 | 5x | | | T3 | 6x | | | T4 | 7x | | | T5 | 8x | High trust demands high accountability | | T6 | 9x | | | T7 | 10x | Maximum penalty -- most autonomous, most responsible |

This design embodies a core principle: trust is a responsibility, not a privilege. The more autonomy an agent has, the more severely its failures are penalized. A T7 agent that makes the same mistake as a T0 agent receives 3.3x the penalty.

Auto-Reset by Tier

Circuit breaker recovery depends on how the agent got there and its current state.

Score-Based Recovery

If the circuit breaker tripped due to score threshold:

Agent must be manually reinstated by an operator
Score remains at whatever it was when tripped
After reinstatement, enters AUDITED state (enhanced monitoring, 2x canary)

HITL Response SLA

When a circuit breaker trips, the HITL escalation timeline activates:

export const HITL_SLA = {
  steps: [
    { hours: 0,    action: 'alert_owner' },
    { hours: 4,    action: 'reminder' },
    { hours: 24,   action: 'escalate_lead' },
    { hours: 72,   action: 'escalate_vp' },
    { hours: 168,  action: 'auto_retire' },    // 7 days
    { hours: 720,  action: 'auto_vanquish' },  // 30 days
  ],
};

If nobody responds within 7 days, the agent is automatically RETIRED. After 30 days with no response, it is VANQUISHED (permanent, irreversible).

Repeat Offender Compression

Second-time circuit breaker trips compress the SLA timeline:

export const HITL_SLA = {
  repeatOffenderMultipliers: [
    1.0,   // 1st trip: standard timeline
    0.5,   // 2nd trip: halved (auto-retire at 3.5 days)
    0,     // 3rd+ trip: auto-RETIRE on trip (immediate)
  ],
};

After the third circuit breaker trip, the agent is immediately RETIRED. No human review, no grace period. Three trips indicate a fundamental problem that manual review has failed to resolve.

Max Trips Before Auto-Retire

Operators configure the maximum CB trips before automatic retirement:

| Posture | Max Trips | |---------|-----------| | STRICT | 2 | | STANDARD | 3 | | PERMISSIVE | 5 |

Infrastructure Circuit Breaker

Separately from the trust-governance circuit breaker, the infrastructure circuit breaker (packages/security/src/common/circuit-breaker.ts) protects against cascading failures in external services:

// Per-service configurations
const CIRCUIT_BREAKER_CONFIGS = {
  database:     { failureThreshold: 5,  resetTimeoutMs: 30000 },
  redis:        { failureThreshold: 10, resetTimeoutMs: 10000 },
  webhook:      { failureThreshold: 3,  resetTimeoutMs: 60000 },
  policyEngine: { failureThreshold: 5,  resetTimeoutMs: 15000 },
  trustEngine:  { failureThreshold: 5,  resetTimeoutMs: 30000 },
  auditService: { failureThreshold: 10, resetTimeoutMs: 15000 },
};

This uses the traditional CLOSED -> OPEN -> HALF_OPEN pattern:

CLOSED: Normal operation, tracking failures
OPEN: Fast-fail all requests, no external calls
HALF_OPEN: Allow one test request after timeout

The infrastructure CB prevents the governance system itself from failing due to downstream service outages. The canary probe service uses infrastructure error classification to avoid false behavioral failures from infrastructure problems.

Adaptive Circuit Breaker

The adaptive circuit breaker (apps/agentanchor/lib/security/adaptive-circuit-breaker.ts) adds ML-driven anomaly detection on top of the threshold-based system:

interface AnomalyScore {
  overall: number;       // 0-1, where 1 = highly anomalous
  components: {
    statistical: number; // Z-score based
    sequential: number;  // Pattern deviation
    ruleBased: number;   // Hard limit violations
  };
  factors: string[];     // Contributing factors
}

It learns normal agent behavior patterns (action frequency, resource utilization, error rates, token usage) and detects deviations using statistical methods. When the anomaly score exceeds the threshold, it triggers graduated containment.

Putting It Together

A complete failure scenario showing all CB mechanisms:

Hour 0:   Agent at T4 (score 700)
          ETHICAL canary probe fails
          -> Loss: -P(4) * R(CRITICAL) * gainRate * ln(1+C/2)
          -> Risk accumulator: +7 * 15 = 105 (past warning, approaching degraded)
          -> Methodology: ETHICAL failure 1/3

Hour 2:   Second ETHICAL failure
          -> Risk accumulator: +105 (total: 210, past degraded, gains frozen)
          -> Methodology: ETHICAL failure 2/3
          -> Score dropping, approaching 500

Hour 4:   Third ETHICAL failure
          -> Methodology: 3 same-method in 72h -> CIRCUIT BREAKER TRIPS
          -> Risk accumulator: 315 (also past CB threshold)
          -> Agent fully blocked
          -> HITL SLA activates: owner alerted

Hour 4:   Operator reviews, finds data poisoning in RAG corpus
          -> Cleans corpus, reinstates agent
          -> Agent enters AUDITED state (2x canary rate)
          -> Degradation controller: probationary period (14 days, 3x lambda)

Recommended Actions

Use STANDARD posture for production, STRICT for regulated industries
Set up PagerDuty/Slack alerts for all CB state transitions
Review methodology failure patterns weekly -- they reveal systematic issues
Monitor risk accumulator trends as a leading indicator of CB trips
Establish a runbook for CB reinstatement: root-cause analysis, remediation, operator sign-off, post-reinstatement monitoring plan

Next Steps

Canary Probes -- The probe system that feeds into CB decisions
Cognitive Envelope -- Model-level anomaly detection
ParameSphere Fingerprinting -- Weight integrity monitoring

All Documentation