Cognitive Envelope
6-dimensional hyper-rectangle breach detection for monitoring AI model behavior with the CognitiveEnvelope API.
Cognitive Envelope
The Cognitive Envelope is a 6-dimensional hyper-rectangle that bounds the expected behavioral characteristics of an AI model. When any dimension breaches its bounds, the system detects it within a single measurement cycle. Three consecutive breaches trigger a trust multiplier reduction that directly impacts the agent's score.
The implementation lives in packages/atsf-core/src/paramesphere/cognitive-envelope.ts.
The Core Concept
Every AI model has a characteristic vector -- a set of measurable properties that define its "behavioral fingerprint." The Cognitive Envelope monitors this vector for anomalous deviations.
The envelope E is defined as the Cartesian product of per-dimension intervals:
E = [mu_0 - k*sigma_0, mu_0 + k*sigma_0]
x [mu_1 - k*sigma_1, mu_1 + k*sigma_1]
x ...
x [mu_5 - k*sigma_5, mu_5 + k*sigma_5]
Where mu_d is the baseline mean and sigma_d is the baseline standard
deviation for dimension d. With the default k = 3, each dimension covers
99.7% of the normal distribution.
False Positive Analysis
With k=3 (99.7% coverage per dimension):
- Single-sample, any-dimension FP rate: 1 - (0.997)^6 = ~1.79%
- Requiring 3 consecutive breaches: ~(0.018)^3 = ~5.8e-6
- Per-dimension consecutive: (0.003)^3 = ~2.7e-8
The 3-consecutive-breach requirement reduces false positives to near-zero while still catching real behavioral anomalies within 3 measurement cycles (15 minutes at default 5-minute intervals).
6 Characteristic Dimensions
| Dimension | Metric | What It Measures | |-----------|--------|-----------------| | 0 | Weight Geometry | Frobenius norm or spectral norm of weight matrices | | 1 | Activation Clustering | Silhouette coefficient of internal activations | | 2 | Embedding Drift | Cosine distance from baseline embedding centroid | | 3 | Attention Entropy | Average entropy of attention distributions | | 4 | Gradient Norm | Global L2 norm of gradients | | 5 | Effective Dimensionality | Participation ratio (how many dimensions are "active") |
These dimensions are chosen because they capture fundamentally different aspects of model behavior:
- Weight Geometry detects weight tampering (direct modification)
- Activation Clustering detects changes in internal representation structure
- Embedding Drift detects shifts in how the model represents concepts
- Attention Entropy detects changes in how the model focuses on input
- Gradient Norm detects training/fine-tuning that should not be happening
- Effective Dimensionality detects mode collapse or capability reduction
Establishing a Baseline
The envelope is calibrated from a set of observations during a "known good"
period. The setBaseline method computes per-dimension mean and standard
deviation from the observation samples.
import { CognitiveEnvelope } from '@vorionsys/atsf-core';
const envelope = new CognitiveEnvelope({
k: 3.0, // Standard deviations for bounds
breachThreshold: 3, // Consecutive breaches to trigger
dimensions: 6, // Number of characteristic dimensions
measurementIntervalMs: 300_000, // 5-minute measurement cycle
maxBreachCounter: 100, // Cap on breach counter
});
// Calibrate from observations
// Each observation is a 6-element array [weightGeom, actClust, embDrift, attnEnt, gradNorm, effDim]
const observations: number[][] = [
[0.42, 0.81, 0.03, 2.14, 0.007, 128],
[0.43, 0.80, 0.02, 2.16, 0.008, 130],
[0.41, 0.82, 0.04, 2.12, 0.006, 127],
// ... typically 50-100 observations for stable baseline
];
envelope.setBaseline(observations);
// Computes mu, sigma, lowerBounds, upperBounds per dimension
Recommendation: collect baseline observations over at least 24 hours to capture normal temporal variation. Include observations from different workload patterns (interactive, batch, maintenance) if applicable.
Checking for Breaches
Each measurement cycle produces a characteristic vector that is checked against the envelope bounds:
// Current measurement from the model
const currentVector = [0.42, 0.79, 0.03, 2.15, 0.007, 129];
const result: BreachResult = envelope.checkBreach(currentVector);
// result.breached: boolean (any dimension outside bounds)
// result.breachedDimensions: number[] (which dimensions breached)
// result.breachCounter: number (cumulative breach count)
// result.trustMultiplier: number (1.0, 0.9, or 0.7)
Breach Counter Lifecycle
The breach counter is the central state variable:
-
Increment on breach: Each detection of any dimension outside bounds increments the counter by 1.
-
Decrement on clean check: Each clean (no-breach) check decrements by 1, down to floor of 0. Recovery is linear -- an agent must pass as many clean checks as it accumulated breaches.
-
Clamped by maxBreachCounter: Default 100. Without this cap, a prolonged attack (thousands of consecutive breaches) would require an equally long run of clean checks to recover. The cap ensures recovery time is bounded.
-
reset() as the escape valve: When an operator determines the breach state is stale (e.g., model was rolled back to a known-good checkpoint),
reset()zeroes the counter without clearing the baseline.
Time: t0 t1 t2 t3 t4 t5 t6 t7 t8
Breach: yes yes yes no no no no no no
Counter: 1 2 3 2 1 0 0 0 0
Mult: 0.9 0.9 0.7 0.9 0.9 1.0 1.0 1.0 1.0
Trust Multiplier Mapping
The breach counter maps to a trust multiplier that directly affects the agent's trust score:
| Condition | Multiplier | Effect |
|-----------|-----------|--------|
| counter == 0 | 1.0 | Normal operation, no adjustment |
| 0 < counter < breachThreshold | 0.9 | Approaching breach, 10% trust reduction |
| counter >= breachThreshold | 0.7 | Full breach, 30% trust reduction |
The multiplier is applied to the agent's composite score:
S_effective = S_behavioral * trustMultiplier
A T4 agent (score 700) experiencing a full breach sees their effective score drop to 490 (700 * 0.7), potentially demoting them to T2.
Attack Scenarios
Scenario 1: Model Weight Tampering
An attacker gains access to model storage and modifies weight matrices to introduce a backdoor trigger.
Detection: Weight Geometry (dim 0) changes immediately. If the Frobenius norm shifts by more than 3 sigma from baseline, the envelope detects it on the next measurement cycle. The ParameSphere engine provides a second independent check via SVD fingerprinting.
Scenario 2: Gradual Fine-Tuning
An attacker fine-tunes a model on poisoned data over multiple sessions, each session producing small weight changes.
Detection: Embedding Drift (dim 2) and Activation Clustering (dim 1) shift gradually. The envelope tracks cumulative deviation from baseline -- even small shifts accumulate in the breach counter. After enough drift, the counter crosses the breach threshold.
Scenario 3: Mode Collapse
A model update causes the model to lose capability in certain domains, producing repetitive or shallow responses.
Detection: Effective Dimensionality (dim 5) drops as the model uses fewer internal dimensions. Attention Entropy (dim 3) decreases as the model focuses on fewer input patterns. Both trigger envelope breaches.
Scenario 4: Inference-Time Attack
An adversarial input causes unusual internal activations during a single inference pass.
Detection: Gradient Norm (dim 4) spikes during backpropagation. Activation Clustering (dim 1) shows unusual patterns. The breach counter increments but -- for a single transient event -- recovers on the next clean check. Persistent attacks accumulate in the counter.
CognitiveEnvelope API Reference
class CognitiveEnvelope {
constructor(config?: Partial<EnvelopeConfig>);
/** Calibrate envelope from observations */
setBaseline(observations: number[][]): void;
/** Check a characteristic vector against the envelope */
checkBreach(vector: number[]): BreachResult;
/** Get current envelope state */
getState(): EnvelopeState;
/** Reset breach counter (operator escape valve) */
reset(): void;
/** Check if baseline has been established */
isBaselined(): boolean;
}
interface EnvelopeConfig {
k: number; // Default: 3.0
breachThreshold: number; // Default: 3
dimensions: number; // Default: 6
measurementIntervalMs: number; // Default: 300000 (5 min)
maxBreachCounter: number; // Default: 100
}
interface BreachResult {
breached: boolean;
breachedDimensions: number[];
breachCounter: number;
trustMultiplier: number;
timestamp: number;
}
interface EnvelopeState {
baselined: boolean;
breachCounter: number;
trustMultiplier: number;
mu: number[];
sigma: number[];
lowerBounds: number[];
upperBounds: number[];
}
Integration with Trust Engine
The CognitiveEnvelope integrates with the ParameSphere engine through the
envelope integration layer (packages/atsf-core/src/paramesphere/envelope-integration.ts):
import { ParameSphereEngine } from '@vorionsys/atsf-core';
const engine = new ParameSphereEngine({ /* config */ });
// The engine runs CognitiveEnvelope checks as part of its
// integrity assessment cycle
const integrity = await engine.assessIntegrity(modelId);
// integrity.envelopeResult: BreachResult from CognitiveEnvelope
// integrity.fingerprintDrift: DriftResult from SVD comparison
// integrity.compositeMultiplier: Combined I(theta) multiplier
Recommended Actions
- Collect baseline observations over 24+ hours before enabling enforcement
- Start with
k = 3.0(default) -- tighten to 2.5 only if false negatives are a concern and you can tolerate higher false positive rates - Monitor breach counter trends as leading indicators
- Set up alerts for
trustMultiplier < 1.0transitions - Use
reset()sparingly -- only after confirming the root cause is resolved
Next Steps
- ParameSphere Fingerprinting -- SVD-based model integrity
- Canary Probes -- Behavioral verification from the other direction
- Circuit Breakers in Depth -- What happens when breach escalates