SecurityUpdated 2026-04-02

Cognitive Envelope

6-dimensional hyper-rectangle breach detection for monitoring AI model behavior with the CognitiveEnvelope API.

Cognitive Envelope

The Cognitive Envelope is a 6-dimensional hyper-rectangle that bounds the expected behavioral characteristics of an AI model. When any dimension breaches its bounds, the system detects it within a single measurement cycle. Three consecutive breaches trigger a trust multiplier reduction that directly impacts the agent's score.

The implementation lives in packages/atsf-core/src/paramesphere/cognitive-envelope.ts.

The Core Concept

Every AI model has a characteristic vector -- a set of measurable properties that define its "behavioral fingerprint." The Cognitive Envelope monitors this vector for anomalous deviations.

The envelope E is defined as the Cartesian product of per-dimension intervals:

E = [mu_0 - k*sigma_0, mu_0 + k*sigma_0]
  x [mu_1 - k*sigma_1, mu_1 + k*sigma_1]
  x ...
  x [mu_5 - k*sigma_5, mu_5 + k*sigma_5]

Where mu_d is the baseline mean and sigma_d is the baseline standard deviation for dimension d. With the default k = 3, each dimension covers 99.7% of the normal distribution.

False Positive Analysis

With k=3 (99.7% coverage per dimension):

Single-sample, any-dimension FP rate: 1 - (0.997)^6 = ~1.79%
Requiring 3 consecutive breaches: ~(0.018)^3 = ~5.8e-6
Per-dimension consecutive: (0.003)^3 = ~2.7e-8

The 3-consecutive-breach requirement reduces false positives to near-zero while still catching real behavioral anomalies within 3 measurement cycles (15 minutes at default 5-minute intervals).

6 Characteristic Dimensions

| Dimension | Metric | What It Measures | |-----------|--------|-----------------| | 0 | Weight Geometry | Frobenius norm or spectral norm of weight matrices | | 1 | Activation Clustering | Silhouette coefficient of internal activations | | 2 | Embedding Drift | Cosine distance from baseline embedding centroid | | 3 | Attention Entropy | Average entropy of attention distributions | | 4 | Gradient Norm | Global L2 norm of gradients | | 5 | Effective Dimensionality | Participation ratio (how many dimensions are "active") |

These dimensions are chosen because they capture fundamentally different aspects of model behavior:

Weight Geometry detects weight tampering (direct modification)
Activation Clustering detects changes in internal representation structure
Embedding Drift detects shifts in how the model represents concepts
Attention Entropy detects changes in how the model focuses on input
Gradient Norm detects training/fine-tuning that should not be happening
Effective Dimensionality detects mode collapse or capability reduction

Establishing a Baseline

The envelope is calibrated from a set of observations during a "known good" period. The setBaseline method computes per-dimension mean and standard deviation from the observation samples.

import { CognitiveEnvelope } from '@vorionsys/atsf-core';

const envelope = new CognitiveEnvelope({
  k: 3.0,                          // Standard deviations for bounds
  breachThreshold: 3,               // Consecutive breaches to trigger
  dimensions: 6,                    // Number of characteristic dimensions
  measurementIntervalMs: 300_000,   // 5-minute measurement cycle
  maxBreachCounter: 100,            // Cap on breach counter
});

// Calibrate from observations
// Each observation is a 6-element array [weightGeom, actClust, embDrift, attnEnt, gradNorm, effDim]
const observations: number[][] = [
  [0.42, 0.81, 0.03, 2.14, 0.007, 128],
  [0.43, 0.80, 0.02, 2.16, 0.008, 130],
  [0.41, 0.82, 0.04, 2.12, 0.006, 127],
  // ... typically 50-100 observations for stable baseline
];

envelope.setBaseline(observations);
// Computes mu, sigma, lowerBounds, upperBounds per dimension

Recommendation: collect baseline observations over at least 24 hours to capture normal temporal variation. Include observations from different workload patterns (interactive, batch, maintenance) if applicable.

Checking for Breaches

Each measurement cycle produces a characteristic vector that is checked against the envelope bounds:

// Current measurement from the model
const currentVector = [0.42, 0.79, 0.03, 2.15, 0.007, 129];

const result: BreachResult = envelope.checkBreach(currentVector);

// result.breached: boolean (any dimension outside bounds)
// result.breachedDimensions: number[] (which dimensions breached)
// result.breachCounter: number (cumulative breach count)
// result.trustMultiplier: number (1.0, 0.9, or 0.7)

Breach Counter Lifecycle

The breach counter is the central state variable:

Increment on breach: Each detection of any dimension outside bounds increments the counter by 1.
Decrement on clean check: Each clean (no-breach) check decrements by 1, down to floor of 0. Recovery is linear -- an agent must pass as many clean checks as it accumulated breaches.
Clamped by maxBreachCounter: Default 100. Without this cap, a prolonged attack (thousands of consecutive breaches) would require an equally long run of clean checks to recover. The cap ensures recovery time is bounded.
reset() as the escape valve: When an operator determines the breach state is stale (e.g., model was rolled back to a known-good checkpoint), reset() zeroes the counter without clearing the baseline.

Time:    t0    t1    t2    t3    t4    t5    t6    t7    t8
Breach:  yes   yes   yes   no    no    no    no    no    no
Counter: 1     2     3     2     1     0     0     0     0
Mult:    0.9   0.9   0.7   0.9   0.9   1.0   1.0   1.0   1.0

Trust Multiplier Mapping

The breach counter maps to a trust multiplier that directly affects the agent's trust score:

| Condition | Multiplier | Effect | |-----------|-----------|--------| | counter == 0 | 1.0 | Normal operation, no adjustment | | 0 < counter < breachThreshold | 0.9 | Approaching breach, 10% trust reduction | | counter >= breachThreshold | 0.7 | Full breach, 30% trust reduction |

The multiplier is applied to the agent's composite score:

S_effective = S_behavioral * trustMultiplier

A T4 agent (score 700) experiencing a full breach sees their effective score drop to 490 (700 * 0.7), potentially demoting them to T2.

Attack Scenarios

Scenario 1: Model Weight Tampering

An attacker gains access to model storage and modifies weight matrices to introduce a backdoor trigger.

Detection: Weight Geometry (dim 0) changes immediately. If the Frobenius norm shifts by more than 3 sigma from baseline, the envelope detects it on the next measurement cycle. The ParameSphere engine provides a second independent check via SVD fingerprinting.

Scenario 2: Gradual Fine-Tuning

An attacker fine-tunes a model on poisoned data over multiple sessions, each session producing small weight changes.

Detection: Embedding Drift (dim 2) and Activation Clustering (dim 1) shift gradually. The envelope tracks cumulative deviation from baseline -- even small shifts accumulate in the breach counter. After enough drift, the counter crosses the breach threshold.

Scenario 3: Mode Collapse

A model update causes the model to lose capability in certain domains, producing repetitive or shallow responses.

Detection: Effective Dimensionality (dim 5) drops as the model uses fewer internal dimensions. Attention Entropy (dim 3) decreases as the model focuses on fewer input patterns. Both trigger envelope breaches.

Scenario 4: Inference-Time Attack

An adversarial input causes unusual internal activations during a single inference pass.

Detection: Gradient Norm (dim 4) spikes during backpropagation. Activation Clustering (dim 1) shows unusual patterns. The breach counter increments but -- for a single transient event -- recovers on the next clean check. Persistent attacks accumulate in the counter.

CognitiveEnvelope API Reference

class CognitiveEnvelope {
  constructor(config?: Partial<EnvelopeConfig>);

  /** Calibrate envelope from observations */
  setBaseline(observations: number[][]): void;

  /** Check a characteristic vector against the envelope */
  checkBreach(vector: number[]): BreachResult;

  /** Get current envelope state */
  getState(): EnvelopeState;

  /** Reset breach counter (operator escape valve) */
  reset(): void;

  /** Check if baseline has been established */
  isBaselined(): boolean;
}

interface EnvelopeConfig {
  k: number;                    // Default: 3.0
  breachThreshold: number;      // Default: 3
  dimensions: number;           // Default: 6
  measurementIntervalMs: number; // Default: 300000 (5 min)
  maxBreachCounter: number;     // Default: 100
}

interface BreachResult {
  breached: boolean;
  breachedDimensions: number[];
  breachCounter: number;
  trustMultiplier: number;
  timestamp: number;
}

interface EnvelopeState {
  baselined: boolean;
  breachCounter: number;
  trustMultiplier: number;
  mu: number[];
  sigma: number[];
  lowerBounds: number[];
  upperBounds: number[];
}

Integration with Trust Engine

The CognitiveEnvelope integrates with the ParameSphere engine through the envelope integration layer (packages/atsf-core/src/paramesphere/envelope-integration.ts):

import { ParameSphereEngine } from '@vorionsys/atsf-core';

const engine = new ParameSphereEngine({ /* config */ });

// The engine runs CognitiveEnvelope checks as part of its
// integrity assessment cycle
const integrity = await engine.assessIntegrity(modelId);

// integrity.envelopeResult: BreachResult from CognitiveEnvelope
// integrity.fingerprintDrift: DriftResult from SVD comparison
// integrity.compositeMultiplier: Combined I(theta) multiplier

Recommended Actions

Collect baseline observations over 24+ hours before enabling enforcement
Start with k = 3.0 (default) -- tighten to 2.5 only if false negatives are a concern and you can tolerate higher false positive rates
Monitor breach counter trends as leading indicators
Set up alerts for trustMultiplier < 1.0 transitions
Use reset() sparingly -- only after confirming the root cause is resolved

Next Steps

ParameSphere Fingerprinting -- SVD-based model integrity
Canary Probes -- Behavioral verification from the other direction
Circuit Breakers in Depth -- What happens when breach escalates

All Documentation