ParameSphere Fingerprinting
SVD-based model fingerprinting for detecting drift and weight tampering, with the I(theta) integrity multiplier.
ParameSphere Fingerprinting
ParameSphere is a model fingerprinting system that uses Singular Value Decomposition (SVD) to create compact, comparable fingerprints of AI model weights. When a model's fingerprint drifts from its baseline, the system reduces the agent's effective trust score through the I(theta) integrity multiplier.
The implementation lives in packages/atsf-core/src/paramesphere/paramesphere-engine.ts
with continuous monitoring in packages/a3i/src/observation/continuous-monitor.ts.
Why SVD Fingerprinting
Model weights are high-dimensional (billions of parameters for modern LLMs). Comparing raw weights is computationally prohibitive and produces noisy results. SVD extracts the most significant structural features of a weight matrix into a compact vector -- the top-K singular values.
Key properties:
- Compression: A 7B-parameter model reduces to 64 singular values
- Sensitivity: Weight tampering changes singular value structure
- Stability: Normal inference does not change singular values
- Speed: Power iteration extracts top-K without full decomposition
How It Works
Step 1: Layer Sampling
Not every layer needs fingerprinting. The engine samples a configurable fraction of layers (default: 15%):
import { ParameSphereEngine } from '@vorionsys/atsf-core';
const engine = new ParameSphereEngine({
K: 64, // Top-K singular values to extract
layerSampleRatio: 0.15, // Sample 15% of layers
cacheSize: 5, // Keep last 5 fingerprints for comparison
driftThreshold: 0.05, // Cosine distance threshold for drift
dualHash: false, // Enable SHA-256 + SHA3-256 for high assurance
});
Step 2: SVD via Power Iteration
For each sampled layer, the engine computes the top-K singular values using deflated power iteration:
// Internal: compute top-K singular values of a weight matrix
// Uses power iteration on A^T*A -- no external linear algebra library needed
function topKSingularValues(
data: Float64Array, // Row-major flat array
rows: number,
cols: number,
k: number,
): Float64Array;
// Returns Float64Array of length min(k, min(rows, cols))
// in descending order
The power iteration approach was chosen deliberately:
- No dependency on NumPy, LAPACK, or other native libraries
- Works in any JavaScript/TypeScript runtime
- Deterministic seeding produces reproducible results
- 300 max iterations with 1e-10 tolerance for convergence
Step 3: Fingerprint Construction
The fingerprint combines SVD singular values with activation statistics and a cryptographic hash:
interface ParameSphereFingerprint {
modelId: string;
singularValues: Float64Array; // Top-K values from SVD
activationStats: ActivationStats; // Mean, variance, kurtosis
hash: string; // SHA-256 of composite vector
sha3Hash?: string; // Optional SHA3-256 (dual-hash mode)
capturedAt: Date;
parameterCount: number;
layersSampled: number;
}
Step 4: Comparison
Two fingerprints are compared using cosine distance on their singular value vectors:
const comparison: FingerprintComparison = engine.compare(baseline, current);
// comparison.cosineDistance: 0 (identical) to 1 (orthogonal)
// comparison.euclideanDistance: absolute distance
// comparison.hashMatch: boolean (definitive same/different)
// comparison.layerDrifts: per-layer drift breakdown
Drift Detection
The continuous monitor (packages/a3i/src/observation/continuous-monitor.ts)
runs fingerprint checks on a configurable schedule.
Drift Severity Classification
| Cosine Distance | Severity | Meaning | |----------------|----------|---------| | < 0.01 | None | Normal variance, no concern | | 0.01 - 0.05 | Minor | Small shift, monitor closely | | 0.05 - 0.15 | Significant | Likely fine-tuning or adaptation | | > 0.15 | Critical | Probable model swap or major tampering |
import { ContinuousMonitorConfig } from '@vorionsys/a3i';
const monitorConfig: ContinuousMonitorConfig = {
pollingIntervalMs: 60_000, // Check every 60 seconds
minorDriftThreshold: 0.01,
significantDriftThreshold: 0.05,
criticalDriftThreshold: 0.15,
topK: 8, // Quick check uses fewer values
checkWeightHash: true, // Also compare SHA-256 of full weights
};
Weight Hash as Definitive Check
The cosine distance is a fuzzy measure -- useful for detecting gradual drift but not definitive for binary "same model or not" questions. The weight hash provides that:
interface DriftCheckResult {
cosineDistance: number;
driftDetected: boolean;
weightHashChanged: boolean; // Definitive: weights were modified
severity: 'none' | 'minor' | 'significant' | 'critical';
}
If weightHashChanged is true, the model's weights were definitively
modified since the baseline was captured. This is a stronger signal than
cosine distance and warrants immediate investigation.
The I(theta) Integrity Multiplier
ParameSphere's primary output is the integrity multiplier I(theta), which adjusts the agent's composite trust score:
S_composite = S_behavioral * I(theta)
Where:
S_behavioralis the trust score from behavioral evaluation (canary probes, proof chain, etc.)I(theta)is the integrity multiplier from ParameSphere (0.0 to 1.0)
Computing I(theta)
The integrity multiplier combines fingerprint drift with the Cognitive Envelope's trust multiplier:
interface IntegrityMultiplier {
value: number; // 0.0 to 1.0
fingerprintDrift: number; // Cosine distance from baseline
envelopeTrustMultiplier: number; // From CognitiveEnvelope
components: {
svdComponent: number; // Based on singular value stability
hashComponent: number; // 1.0 if hash matches, 0.5 if not
envelopeComponent: number; // From cognitive envelope
};
}
The default weighting:
const DEFAULT_INTEGRITY_WEIGHTS = {
svd: 0.4, // SVD fingerprint stability
hash: 0.3, // Weight hash match
envelope: 0.3, // Cognitive envelope state
};
// I(theta) = weights.svd * svdComponent
// + weights.hash * hashComponent
// + weights.envelope * envelopeComponent
Impact on Trust Score
A concrete example:
Agent at T4 Standard, behavioral score 700
ParameSphere detects significant drift (cosine = 0.08)
SVD component: 0.85 (moderate drift penalty)
Hash component: 0.5 (weights changed)
Envelope component: 0.9 (approaching breach)
I(theta) = 0.4 * 0.85 + 0.3 * 0.5 + 0.3 * 0.9
= 0.34 + 0.15 + 0.27
= 0.76
S_composite = 700 * 0.76 = 532
Agent drops from T4 (650-799) to T3 (500-649)
Capabilities reduced accordingly
ParameSphereEngine API
class ParameSphereEngine {
constructor(config?: Partial<ParameSphereConfig>);
/** Generate a fingerprint from model weights */
fingerprint(
modelId: string,
layers: Map<string, Float64Array>,
activations?: ActivationStats,
): ParameSphereFingerprint;
/** Compare two fingerprints */
compare(
baseline: ParameSphereFingerprint,
current: ParameSphereFingerprint,
): FingerprintComparison;
/** Check for drift against stored baseline */
checkDrift(modelId: string, current: ParameSphereFingerprint): DriftResult;
/** Compute integrity multiplier */
computeIntegrity(
drift: DriftResult,
envelopeMultiplier: number,
): IntegrityMultiplier;
/** Store a fingerprint as the new baseline */
setBaseline(fingerprint: ParameSphereFingerprint): void;
}
Fingerprint Storage
The IFingerprintStore interface allows pluggable storage backends:
interface IFingerprintStore {
store(fingerprint: StoredFingerprint): Promise<void>;
getBaseline(modelId: string): Promise<StoredFingerprint | null>;
getHistory(modelId: string, limit: number): Promise<StoredFingerprint[]>;
}
Attack Scenarios
Scenario 1: Backdoor Injection
An attacker modifies specific weight matrices to introduce a trigger phrase that causes the model to output a specific response.
Detection: Even targeted weight modifications change the singular value structure. The weight hash changes definitively. Hash component drops to 0.5, SVD component drops based on the magnitude of change. I(theta) decreases, trust score drops.
Scenario 2: Model Swap
An attacker replaces the entire model with a different one (e.g., swapping a safe model for an uncensored variant).
Detection: Cosine distance exceeds 0.15 (critical threshold). Weight hash changes. I(theta) drops close to 0.35 (0.4 * 0.0 + 0.3 * 0.5 + 0.3 * 0.5). Agent trust plummets, circuit breaker likely trips.
Scenario 3: Unauthorized Fine-Tuning
A developer fine-tunes the production model without going through the governance process.
Detection: Gradual singular value drift over the fine-tuning session. The continuous monitor detects the shift within 60 seconds (polling mode) or immediately (streaming mode with framework hooks). Severity classified as "significant" if cosine distance is 0.05-0.15.
Scenario 4: Quantization Attack
An attacker provides a maliciously quantized model version that behaves differently at certain precision levels.
Detection: Quantization changes the weight distribution, which changes singular values. The SVD fingerprint captures this as drift. The weight hash also changes. Combined detection through both channels.
Continuous Monitoring Setup
import { ContinuousParameSphereMonitor } from '@vorionsys/a3i';
const monitor = new ContinuousParameSphereMonitor({
pollingIntervalMs: 60_000,
criticalDriftThreshold: 0.15,
significantDriftThreshold: 0.05,
topK: 8,
checkWeightHash: true,
});
// Register a model for monitoring
await monitor.registerModel(modelId, baselineFingerprint);
// Start monitoring
monitor.start();
// Listen for drift events
monitor.on('drift', (result: DriftCheckResult) => {
if (result.severity === 'critical') {
// Immediate response: freeze agent, alert operators
agentFreezeService.freeze(result.modelId, 'Critical model drift detected');
}
});
Recommended Actions
- Capture baseline fingerprints during model deployment, before any production traffic
- Use dual-hash mode (
dualHash: true) for high-assurance environments - Monitor drift trends over time -- even "minor" drift that persists may indicate a slow attack
- Set up alerts for any
weightHashChanged: trueevent -- this is definitive evidence of model modification - Integrate I(theta) into your trust score display so operators see integrity alongside behavioral trust
Next Steps
- Cognitive Envelope -- The behavioral dimension monitoring that complements SVD fingerprinting
- Circuit Breakers in Depth -- What happens when I(theta) drops below critical
- Canary Probes -- Behavioral verification that validates what fingerprinting detects