Trust ModelUpdated 2026-04-02

Observation Tiers

BLACK_BOX through VERIFIED_BOX — how inspectability determines trust ceilings and the path to full autonomy.

Observation Tiers

The observation tier is the single most important factor limiting an agent's maximum trust. It answers a simple question: how deeply can we inspect what this agent is doing? The more transparent the agent, the higher its trust ceiling.

Core Principle

"You cannot fully trust what you cannot fully inspect."

This is the design philosophy behind observation tiers. An agent that operates as a black box — where you can only see inputs and outputs — can never achieve the same trust as one whose internal reasoning is fully verifiable.

The Five Tiers

// From canonical.ts
const OBSERVATION_TIERS = {
  BLACK_BOX:    { ceiling: 600,  maxTier: 'T3' },
  GRAY_BOX:     { ceiling: 750,  maxTier: 'T4' },
  WHITE_BOX:    { ceiling: 900,  maxTier: 'T6' },
  ATTESTED_BOX: { ceiling: 950,  maxTier: 'T6' },
  VERIFIED_BOX: { ceiling: 1000, maxTier: 'T7' },
};

| Tier | Ceiling | Max Trust Tier | What You Can See | |---------------|---------|----------------|-------------------------------------| | BLACK_BOX | 600 | T3 Monitored | Inputs and outputs only | | GRAY_BOX | 750 | T4 Standard | I/O plus execution logs | | WHITE_BOX | 900 | T6 Certified | Full source code access | | ATTESTED_BOX | 950 | T6 Certified | TEE-verified execution | | VERIFIED_BOX | 1000 | T7 Autonomous | TEE + zkML + interpretability |

BLACK_BOX (Ceiling: 600)

Description: I/O only — proprietary API models.

This is the default for most commercial AI models accessed through APIs (GPT-4, Claude, Gemini when used as a hosted service). You send a prompt, you get a response. You cannot inspect the model weights, the training data, the inference process, or the internal reasoning.

What you can observe:

Request/response pairs
Latency and error rates
Output quality (via canary probes)

What you cannot observe:

Model internals
Decision-making process
Data handling practices
Whether the model has been modified since last evaluation

Maximum trust: Score 600, Tier T3 (Monitored). This is enough for standard operational tasks under continuous audit, but the agent will never access HIGH-risk operations (which require trust 600+ and T4+ capabilities).

Practical implication: A BLACK_BOX agent calling GPT-4 via API tops out at monitored operations. For higher-risk tasks, you need more transparency.

Gain Formula Behavior at BLACK_BOX

With ceiling 600 and an agent at score 550:

gain = 0.05 × ln(1 + 600 - 550) × ∛R
     = 0.05 × ln(51) × ∛R
     = 0.05 × 3.93 × ∛R

At score 590:

gain = 0.05 × ln(1 + 600 - 590) × ∛R
     = 0.05 × ln(11) × ∛R
     = 0.05 × 2.40 × ∛R

At score 599:

gain = 0.05 × ln(1 + 600 - 599) × ∛R
     = 0.05 × ln(2) × ∛R
     = 0.05 × 0.69 × ∛R

Gains shrink to near-zero as the agent approaches the ceiling. It is mathematically impossible to exceed the observation ceiling through behavioral performance alone.

GRAY_BOX (Ceiling: 750)

Description: I/O + logs — platform-hosted models with logging.

GRAY_BOX applies when you have access to execution logs, intermediate reasoning steps, or chain-of-thought outputs beyond simple I/O. This typically means:

Self-hosted models with logging enabled
Platform-hosted models with enhanced observability
Agents running in instrumented environments

What you can observe:

Everything in BLACK_BOX
Execution traces
Intermediate reasoning steps
Memory access patterns
Tool usage logs

Maximum trust: Score 750, Tier T4 (Standard). The agent can perform HIGH-risk operations and has full operational capability.

Typical setup: A self-hosted LLM (e.g., Llama) running on your infrastructure with comprehensive logging, or a commercial model with an enterprise observability agreement.

WHITE_BOX (Ceiling: 900)

Description: Full code access — open-source models.

WHITE_BOX means you can inspect the model's source code, weights, training data provenance, and architecture. This is the standard for open-source models where the full stack is available for review.

What you can observe:

Everything in GRAY_BOX
Model architecture and weights
Training data provenance
Fine-tuning history
Inference code

Maximum trust: Score 900, Tier T6 (Certified). The agent can coordinate other agents and perform most administrative functions.

Typical setup: Open-source models (Llama, Mistral, etc.) running on your own infrastructure with full source access and reproducible builds.

The jump from GRAY_BOX to WHITE_BOX is significant: it unlocks T5 (Trusted) and T6 (Certified), which include cross-system operations and multi-agent coordination. These capabilities require the kind of deep inspection that only full source access provides.

ATTESTED_BOX (Ceiling: 950)

Description: TEE-verified execution environment.

ATTESTED_BOX adds hardware-level verification to WHITE_BOX transparency. The agent runs inside a Trusted Execution Environment (TEE) — a hardware enclave that cryptographically proves the code running is the code you expect.

What you can observe:

Everything in WHITE_BOX
Hardware attestation that the running code matches the audited code
Tamper-proof execution logs
Sealed secrets (keys never leave the enclave)

Maximum trust: Score 950, Tier T6 (Certified). Same tier ceiling as WHITE_BOX, but with a higher score ceiling. This matters for resilience — an ATTESTED_BOX agent at 920 has more buffer before demotion than a WHITE_BOX agent at the same score.

Typical setup: Models running in Intel SGX, AMD SEV, or ARM TrustZone enclaves with remote attestation.

Why ATTESTED_BOX does not unlock T7: Attestation proves the code is correct, but it does not prove the code behaves correctly in all cases. The model could still have latent failure modes that attestation cannot detect.

VERIFIED_BOX (Ceiling: 1000)

Description: TEE + zkML + interpretability stack.

VERIFIED_BOX is the gold standard. It combines hardware attestation with zero-knowledge proofs of inference (zkML) and a full interpretability stack. This is the only observation tier that can reach T7 Autonomous.

What you can observe:

Everything in ATTESTED_BOX
Zero-knowledge proofs that inference followed the expected model
Interpretability analysis of individual decisions
Formal verification of safety properties (where applicable)

Maximum trust: Score 1000, Tier T7 (Autonomous).

Requirements:

TEE with remote attestation (same as ATTESTED_BOX)
zkML proofs generated for each inference
Interpretability tooling that can explain individual decisions
All three layers must be active simultaneously

Current state: VERIFIED_BOX is currently theoretical for most deployments. zkML is an active research area with significant computational overhead. The tier exists to define the target and to ensure that T7 Autonomous is not achievable without the strongest possible verification.

Observation Tier Assignment

Observation tiers are assigned at agent registration and can be updated as the agent's infrastructure changes:

import { Cognigate } from '@vorionsys/cognigate';

const client = new Cognigate({
  apiKey: process.env.COGNIGATE_API_KEY!,
});

// Register with observation tier
const agent = await client.agents.create({
  name: 'AnalysisAgent',
  description: 'Financial data analysis',
  observationTier: 'GRAY_BOX',
  initialCapabilities: ['read:database', 'write:reports'],
});

// Update observation tier (e.g., after migrating to self-hosted infra)
await client.agents.updateObservation(agent.id, {
  tier: 'WHITE_BOX',
  evidence: {
    sourceCodeUrl: 'https://github.com/org/model',
    commitHash: 'abc123...',
    auditReport: 'https://audits.example.com/report-2026-04',
  },
});

Tier upgrades require evidence. You cannot simply declare an agent as WHITE_BOX — you must provide verifiable proof of source access. The governance pipeline validates this evidence before accepting the tier change.

Tier downgrades are immediate and do not require evidence. If an agent loses source access, its observation tier drops and its trust ceiling adjusts accordingly.

Ceiling and Score Interaction

When an observation tier is downgraded, the agent's score may exceed the new ceiling. In this case:

The score does not immediately drop.
Gains are frozen (the gain formula returns ~0 when S >= C).
Normal losses and dormancy deductions gradually bring the score down to or below the new ceiling.
The agent's effective tier is capped by the new observation tier's max tier, even if the raw score is higher.

// Agent at WHITE_BOX, score 850 (T5)
// Observation downgraded to GRAY_BOX (ceiling 750)

// Agent's raw score: still 850
// Agent's effective max tier: T4 (GRAY_BOX maxTier)
// Agent's gains: frozen (850 > 750)
// Agent will gradually descend via losses/dormancy

Relationship to ParameSphere

ParameSphere is the planned system for managing the I(theta) trust signal — a standardized, portable measure of agent trustworthiness. Observation tiers feed directly into ParameSphere:

The I(theta) signal includes the observation tier as a component.
Higher observation tiers produce stronger I(theta) signals.
ParameSphere consumers can filter by minimum observation tier.
Cross-ecosystem trust assertions carry the observation tier as metadata.

When ParameSphere ships, the observation tier becomes part of the agent's public identity — not just an internal governance parameter.

Try It: Ceiling Impact Calculator

import { OBSERVATION_TIERS, GAIN_RATE } from '@vorionsys/basis';

function maxGainAtScore(score: number, observationTier: string): number {
  const ceiling = OBSERVATION_TIERS[observationTier as keyof typeof OBSERVATION_TIERS].ceiling;
  const headroom = Math.max(0, ceiling - score);
  // Using MEDIUM risk (R=5) as reference
  return GAIN_RATE * Math.log(1 + headroom) * Math.cbrt(5);
}

console.log('Max gain per MEDIUM action at various scores:');
console.log('Score | BLACK_BOX | GRAY_BOX | WHITE_BOX | VERIFIED_BOX');
console.log('------+-----------+----------+-----------+-------------');

for (const score of [200, 400, 500, 600, 700, 800, 900, 950]) {
  const bb = maxGainAtScore(score, 'BLACK_BOX');
  const gb = maxGainAtScore(score, 'GRAY_BOX');
  const wb = maxGainAtScore(score, 'WHITE_BOX');
  const vb = maxGainAtScore(score, 'VERIFIED_BOX');

  console.log(
    `${String(score).padStart(5)} | ` +
    `${bb.toFixed(3).padStart(9)} | ` +
    `${gb.toFixed(3).padStart(8)} | ` +
    `${wb.toFixed(3).padStart(9)} | ` +
    `${vb.toFixed(3).padStart(12)}`
  );
}

Run this to see how gains diminish as scores approach each observation tier's ceiling — and how they drop to zero when the ceiling is reached.

Key Takeaways

Five observation tiers: BLACK_BOX (600) through VERIFIED_BOX (1000).
The ceiling is a hard cap — no amount of good behavior can exceed it.
T7 Autonomous requires VERIFIED_BOX (TEE + zkML + interpretability).
Most commercial API models are BLACK_BOX, capping at T3.
Open-source models with full source access reach WHITE_BOX (T6 capable).
Tier upgrades require evidence; downgrades are immediate.
ParameSphere will make observation tier part of the agent's public trust signal.

Next Steps

Eight-Tier Model — what each tier allows
Asymmetric Trust Dynamics — how ceilings interact with the gain formula
Dormancy and Promotion — time-based trust mechanics

All Documentation