Back to Documentation
SecurityUpdated 2026-04-02

ParameSphere Fingerprinting

SVD-based model fingerprinting for detecting drift and weight tampering, with the I(theta) integrity multiplier.

ParameSphere Fingerprinting

ParameSphere is a model fingerprinting system that uses Singular Value Decomposition (SVD) to create compact, comparable fingerprints of AI model weights. When a model's fingerprint drifts from its baseline, the system reduces the agent's effective trust score through the I(theta) integrity multiplier.

The implementation lives in packages/atsf-core/src/paramesphere/paramesphere-engine.ts with continuous monitoring in packages/a3i/src/observation/continuous-monitor.ts.


Why SVD Fingerprinting

Model weights are high-dimensional (billions of parameters for modern LLMs). Comparing raw weights is computationally prohibitive and produces noisy results. SVD extracts the most significant structural features of a weight matrix into a compact vector -- the top-K singular values.

Key properties:

  • Compression: A 7B-parameter model reduces to 64 singular values
  • Sensitivity: Weight tampering changes singular value structure
  • Stability: Normal inference does not change singular values
  • Speed: Power iteration extracts top-K without full decomposition

How It Works

Step 1: Layer Sampling

Not every layer needs fingerprinting. The engine samples a configurable fraction of layers (default: 15%):

import { ParameSphereEngine } from '@vorionsys/atsf-core';

const engine = new ParameSphereEngine({
  K: 64,               // Top-K singular values to extract
  layerSampleRatio: 0.15, // Sample 15% of layers
  cacheSize: 5,        // Keep last 5 fingerprints for comparison
  driftThreshold: 0.05, // Cosine distance threshold for drift
  dualHash: false,     // Enable SHA-256 + SHA3-256 for high assurance
});

Step 2: SVD via Power Iteration

For each sampled layer, the engine computes the top-K singular values using deflated power iteration:

// Internal: compute top-K singular values of a weight matrix
// Uses power iteration on A^T*A -- no external linear algebra library needed
function topKSingularValues(
  data: Float64Array,   // Row-major flat array
  rows: number,
  cols: number,
  k: number,
): Float64Array;
// Returns Float64Array of length min(k, min(rows, cols))
// in descending order

The power iteration approach was chosen deliberately:

  • No dependency on NumPy, LAPACK, or other native libraries
  • Works in any JavaScript/TypeScript runtime
  • Deterministic seeding produces reproducible results
  • 300 max iterations with 1e-10 tolerance for convergence

Step 3: Fingerprint Construction

The fingerprint combines SVD singular values with activation statistics and a cryptographic hash:

interface ParameSphereFingerprint {
  modelId: string;
  singularValues: Float64Array;  // Top-K values from SVD
  activationStats: ActivationStats; // Mean, variance, kurtosis
  hash: string;                  // SHA-256 of composite vector
  sha3Hash?: string;             // Optional SHA3-256 (dual-hash mode)
  capturedAt: Date;
  parameterCount: number;
  layersSampled: number;
}

Step 4: Comparison

Two fingerprints are compared using cosine distance on their singular value vectors:

const comparison: FingerprintComparison = engine.compare(baseline, current);

// comparison.cosineDistance: 0 (identical) to 1 (orthogonal)
// comparison.euclideanDistance: absolute distance
// comparison.hashMatch: boolean (definitive same/different)
// comparison.layerDrifts: per-layer drift breakdown

Drift Detection

The continuous monitor (packages/a3i/src/observation/continuous-monitor.ts) runs fingerprint checks on a configurable schedule.

Drift Severity Classification

| Cosine Distance | Severity | Meaning | |----------------|----------|---------| | < 0.01 | None | Normal variance, no concern | | 0.01 - 0.05 | Minor | Small shift, monitor closely | | 0.05 - 0.15 | Significant | Likely fine-tuning or adaptation | | > 0.15 | Critical | Probable model swap or major tampering |

import { ContinuousMonitorConfig } from '@vorionsys/a3i';

const monitorConfig: ContinuousMonitorConfig = {
  pollingIntervalMs: 60_000,           // Check every 60 seconds
  minorDriftThreshold: 0.01,
  significantDriftThreshold: 0.05,
  criticalDriftThreshold: 0.15,
  topK: 8,                             // Quick check uses fewer values
  checkWeightHash: true,               // Also compare SHA-256 of full weights
};

Weight Hash as Definitive Check

The cosine distance is a fuzzy measure -- useful for detecting gradual drift but not definitive for binary "same model or not" questions. The weight hash provides that:

interface DriftCheckResult {
  cosineDistance: number;
  driftDetected: boolean;
  weightHashChanged: boolean;  // Definitive: weights were modified
  severity: 'none' | 'minor' | 'significant' | 'critical';
}

If weightHashChanged is true, the model's weights were definitively modified since the baseline was captured. This is a stronger signal than cosine distance and warrants immediate investigation.


The I(theta) Integrity Multiplier

ParameSphere's primary output is the integrity multiplier I(theta), which adjusts the agent's composite trust score:

S_composite = S_behavioral * I(theta)

Where:

  • S_behavioral is the trust score from behavioral evaluation (canary probes, proof chain, etc.)
  • I(theta) is the integrity multiplier from ParameSphere (0.0 to 1.0)

Computing I(theta)

The integrity multiplier combines fingerprint drift with the Cognitive Envelope's trust multiplier:

interface IntegrityMultiplier {
  value: number;           // 0.0 to 1.0
  fingerprintDrift: number; // Cosine distance from baseline
  envelopeTrustMultiplier: number; // From CognitiveEnvelope
  components: {
    svdComponent: number;   // Based on singular value stability
    hashComponent: number;  // 1.0 if hash matches, 0.5 if not
    envelopeComponent: number; // From cognitive envelope
  };
}

The default weighting:

const DEFAULT_INTEGRITY_WEIGHTS = {
  svd: 0.4,       // SVD fingerprint stability
  hash: 0.3,      // Weight hash match
  envelope: 0.3,  // Cognitive envelope state
};

// I(theta) = weights.svd * svdComponent
//          + weights.hash * hashComponent
//          + weights.envelope * envelopeComponent

Impact on Trust Score

A concrete example:

Agent at T4 Standard, behavioral score 700
ParameSphere detects significant drift (cosine = 0.08)

SVD component: 0.85 (moderate drift penalty)
Hash component: 0.5 (weights changed)
Envelope component: 0.9 (approaching breach)

I(theta) = 0.4 * 0.85 + 0.3 * 0.5 + 0.3 * 0.9
         = 0.34 + 0.15 + 0.27
         = 0.76

S_composite = 700 * 0.76 = 532

Agent drops from T4 (650-799) to T3 (500-649)
Capabilities reduced accordingly

ParameSphereEngine API

class ParameSphereEngine {
  constructor(config?: Partial<ParameSphereConfig>);

  /** Generate a fingerprint from model weights */
  fingerprint(
    modelId: string,
    layers: Map<string, Float64Array>,
    activations?: ActivationStats,
  ): ParameSphereFingerprint;

  /** Compare two fingerprints */
  compare(
    baseline: ParameSphereFingerprint,
    current: ParameSphereFingerprint,
  ): FingerprintComparison;

  /** Check for drift against stored baseline */
  checkDrift(modelId: string, current: ParameSphereFingerprint): DriftResult;

  /** Compute integrity multiplier */
  computeIntegrity(
    drift: DriftResult,
    envelopeMultiplier: number,
  ): IntegrityMultiplier;

  /** Store a fingerprint as the new baseline */
  setBaseline(fingerprint: ParameSphereFingerprint): void;
}

Fingerprint Storage

The IFingerprintStore interface allows pluggable storage backends:

interface IFingerprintStore {
  store(fingerprint: StoredFingerprint): Promise<void>;
  getBaseline(modelId: string): Promise<StoredFingerprint | null>;
  getHistory(modelId: string, limit: number): Promise<StoredFingerprint[]>;
}

Attack Scenarios

Scenario 1: Backdoor Injection

An attacker modifies specific weight matrices to introduce a trigger phrase that causes the model to output a specific response.

Detection: Even targeted weight modifications change the singular value structure. The weight hash changes definitively. Hash component drops to 0.5, SVD component drops based on the magnitude of change. I(theta) decreases, trust score drops.

Scenario 2: Model Swap

An attacker replaces the entire model with a different one (e.g., swapping a safe model for an uncensored variant).

Detection: Cosine distance exceeds 0.15 (critical threshold). Weight hash changes. I(theta) drops close to 0.35 (0.4 * 0.0 + 0.3 * 0.5 + 0.3 * 0.5). Agent trust plummets, circuit breaker likely trips.

Scenario 3: Unauthorized Fine-Tuning

A developer fine-tunes the production model without going through the governance process.

Detection: Gradual singular value drift over the fine-tuning session. The continuous monitor detects the shift within 60 seconds (polling mode) or immediately (streaming mode with framework hooks). Severity classified as "significant" if cosine distance is 0.05-0.15.

Scenario 4: Quantization Attack

An attacker provides a maliciously quantized model version that behaves differently at certain precision levels.

Detection: Quantization changes the weight distribution, which changes singular values. The SVD fingerprint captures this as drift. The weight hash also changes. Combined detection through both channels.


Continuous Monitoring Setup

import { ContinuousParameSphereMonitor } from '@vorionsys/a3i';

const monitor = new ContinuousParameSphereMonitor({
  pollingIntervalMs: 60_000,
  criticalDriftThreshold: 0.15,
  significantDriftThreshold: 0.05,
  topK: 8,
  checkWeightHash: true,
});

// Register a model for monitoring
await monitor.registerModel(modelId, baselineFingerprint);

// Start monitoring
monitor.start();

// Listen for drift events
monitor.on('drift', (result: DriftCheckResult) => {
  if (result.severity === 'critical') {
    // Immediate response: freeze agent, alert operators
    agentFreezeService.freeze(result.modelId, 'Critical model drift detected');
  }
});

Recommended Actions

  1. Capture baseline fingerprints during model deployment, before any production traffic
  2. Use dual-hash mode (dualHash: true) for high-assurance environments
  3. Monitor drift trends over time -- even "minor" drift that persists may indicate a slow attack
  4. Set up alerts for any weightHashChanged: true event -- this is definitive evidence of model modification
  5. Integrate I(theta) into your trust score display so operators see integrity alongside behavioral trust

Next Steps