Model Output Provenance Fingerprint: Structural Proximity Without Model Access

by Nick Clark | Published March 27, 2026 | PDF

Outputs of generative models inherit structural properties from their inputs and from the model that produced them. The provenance fingerprint disclosed in Provisional Application 63/808,372 attaches to each generated output a record carrying the variance-derived identity of the inputs it was conditioned on and the variance-derived identity of the generator that produced it, sealed so that downstream consumers can verify both attributions without access to the model's weights or to its training logs. The fingerprint is tamper-evident: any modification to the output, to the input identifiers, or to the generator identifier breaks the seal and is detectable by any verifier holding the public verification material. A consumer of the output therefore knows which inputs entered the generator, which generator produced it, and that no party in the chain has substituted a different attribution after the fact.


Mechanism

The provenance fingerprint mechanism operates at the moment of output generation. As the generator produces an output, the runtime computes the structural variance vector of the output, retrieves the variance-derived identifiers of the inputs that conditioned the generation, and retrieves the variance-derived identifier of the generator itself. The generator identifier is computed from the structural variance of a designated reference manifold — a fixed corpus of probe inputs against which the generator's outputs are characterized — rather than from the generator's weights, so the identifier is stable across deployment formats and is computable by any party who can submit the probe inputs to the generator and observe the outputs.

The output's variance vector, the input identifier set, and the generator identifier are concatenated into a provenance record. The record is hashed, and the hash is signed by the generator runtime under a key whose certificate is attested by a registration authority. The signed hash, the public elements of the record, and the certificate chain are bundled with the output. The output's own identity hash includes the provenance record hash, so the output cannot be distributed under an identity that omits its provenance.

Verification is performed by recomputing the output's variance vector from the output bytes, comparing it to the vector recorded in the provenance, hashing the record, and validating the signature against the certificate chain. A successful verification establishes that the output is the same artifact whose provenance was sealed, that the input identifiers and generator identifier in the record were the values present at sealing time, and that the sealer was the generator runtime whose certificate is attested. A failed verification — whether by vector mismatch, hash mismatch, or signature failure — indicates tampering and the output is rejected as unattributable.

Tamper-evidence extends to the input attributions. Each input identifier in the record is itself an variance-derived identity computed by the same anchoring procedure that anchors atomic content. A consumer who possesses the input can verify that its variance vector matches the identifier in the provenance record; an attacker who substitutes a different input cannot produce a matching identifier without recomputing the variance vector, and cannot revise the provenance record without invalidating the signature. The generator identifier is similarly verifiable: a consumer who possesses or can query the generator can verify it against the reference manifold.

Operating Parameters

The reference manifold for generator identification consists of a published set of probe inputs spanning the generator's expected input domain. The set is sized to characterize the generator's behavior with discrimination sufficient to distinguish it from other generators in the registry while keeping the cost of computing the identifier tractable. Reference implementations use probe sets containing between several hundred and several thousand inputs per modality, with the variance vectors of the resulting outputs concatenated and hashed to form the generator identifier. The probe set is signed by the registration authority and is publicly retrievable, so any party can independently recompute a generator's identifier given access to the generator.

The signing key used by the generator runtime is bound to the generator instance through an attested certificate. The certificate is issued by a registration authority that has verified, by submitting the probe set, that the generator instance produces the outputs corresponding to the claimed identifier. The certificate is renewed periodically and is revocable; consumers verifying a provenance record consult a revocation channel to ensure the certificate was valid at the time of sealing. The mechanism does not require trust in the generator operator; it requires only trust in the registration authority's attestation procedure.

Provenance records are bounded in size. The output variance vector occupies a fixed envelope determined by the modality calibration; the input identifier set is variable but bounded by the maximum number of inputs the generator accepts per generation; the generator identifier is fixed. The total record fits within a single network frame for typical deployments, so distribution of the provenance alongside the output adds minimal overhead. Where an output is conditioned on a very large input set, the record records a Merkle root over the input identifiers and the consumer retrieves individual identifiers on demand.

Verification cost is dominated by the recomputation of the output's variance vector, which is bounded by the modality calibration and is amortizable across multiple verifications of the same output. Signature verification is constant-time. The mechanism is therefore deployable in consumer-side verifiers running on commodity hardware.

Alternative Embodiments

Embodiments vary in how the generator identifier is computed. The reference embodiment uses a fixed probe set; an alternative uses a randomized probe set committed to in advance via a verifiable random function, which prevents a generator operator from tuning the generator to perform well on the published probes. A further alternative composes the generator identifier from multiple probe sets covering distinct sub-domains, yielding a structured identifier whose components can be verified independently.

Embodiments vary in the granularity of input attribution. A coarse embodiment records the identifier of each top-level input that conditioned the generation; a fine embodiment records identifiers for sub-input components, such as individual passages of a retrieved document or individual frames of a reference video. The choice of granularity is recorded in the provenance, so verifiers know what level of attribution to expect.

Embodiments vary in the signing arrangement. A single-signer embodiment has the generator runtime sign the provenance with a single key; a multi-signer embodiment has independent runtimes — for example, the model server and a separate attestation enclave — sign concurrently, with verification requiring that all signatures validate. The multi-signer embodiment increases robustness to single-key compromise at the cost of additional infrastructure.

Embodiments contemplate streaming outputs. For outputs produced incrementally — long-form text, video, audio — the provenance record is structured as a series of segment records each carrying the variance vector of its segment, the active input identifier set at the time of the segment, and the generator identifier, all signed and chained so that segments cannot be reordered or substituted. The consumer verifies each segment as it arrives and verifies the chain on completion.

Embodiments contemplate post-generation transformations. An output that is intentionally transformed after generation — recompressed, cropped, translated — records the transformation as a derivative provenance edge whose source is the original sealed output and whose destination is the transformed output, computed by the same composite-lineage decomposition mechanism that handles multi-root attribution. The transformed output therefore carries both its derivative attribution and, transitively, the original input and generator attributions.

Composition with Other Mechanisms

Output provenance composes with atomic content anchoring through the input identifiers. Each input identifier is the variance-derived identity of an atomic work, so the provenance record is a structured pointer set into the same identity space that anchors non-generated content. A verifier that holds an input can verify its identity directly; a verifier that does not can request the input by identifier from any party that holds it.

Output provenance composes with composite lineage when a generated output is incorporated into a composite. The composite's lineage graph carries an edge to the output's identity record; the identity record references the provenance; the provenance references the inputs and the generator. A verifier traversing the composite's lineage thus reaches the inputs and generator without any party in the chain having to expose internal state. Output provenance composes with policy-resident governance frameworks that gate downstream use based on attribution. A policy that admits only outputs produced by registered generators reads the generator identifier and validates it against the registry; a policy that requires consent from the holders of all input attributions enumerates the input identifier set; a policy that bars derivatives of certain corpora rejects an output whose input set intersects the prohibited set. Because the attributions are tamper-evident, the policy decisions are not subject to evasion by re-labeling.

Output provenance composes with audit and accountability infrastructure. An auditor receiving a corpus of generated outputs can sample, verify, and aggregate provenance records to characterize generator behavior across populations without the generator operator's cooperation, because the verification material is public. The composition is therefore appropriate for regulatory and forensic settings as well as for routine provenance.

Prior-Art Distinctions

Conventional approaches to model-output provenance fall into three families, each of which the disclosed mechanism distinguishes itself from. Watermarking embeds a signal in the output that identifies the generator. Watermarks are removable by adversarial processing, do not attribute inputs, and rely on every relevant generator applying a watermark. The disclosed mechanism does not depend on an embedded signal: the variance vector is computed from the output's own structure, and the provenance record is sealed by signature rather than by embedding.

Logged-attribution systems require the generator operator to publish training and inference logs from which provenance can be reconstructed. The trust model relies on the operator's honesty and on the integrity of the logs; it requires access to operator-controlled infrastructure to verify. The disclosed mechanism shifts attribution to the moment of generation, seals it with a key attested independently, and exposes verification to any consumer without requiring access to operator logs.

Model-card and metadata systems describe a generator's training data and characteristics in human-readable form. Such descriptions are not tied to individual outputs and are not verifiable from the output itself. The disclosed mechanism produces a per-output, machine-verifiable record that links a specific output to specific input identifiers and to a measurable generator identifier.

Cryptographic commitment schemes for model outputs anticipate the binding of an output to its inputs but typically commit to the inputs by hash without addressing whether the hash corresponds to the input the consumer holds. The disclosed mechanism uses variance-derived identity, so the commitment binds to the structural identity of the input rather than to a hash of an arbitrary representation, and a consumer can verify the binding by recomputing the identity from the input it actually possesses.

Disclosure Scope

This article describes the model-output provenance fingerprint mechanism as disclosed in Provisional Application 63/808,372 covering content anchoring through structural variance analysis. The disclosure encompasses the computation of an variance vector for the generated output, the assembly of a provenance record carrying input identifiers and a generator identifier, the signing of the record under an attested generator-runtime key, the binding of the output's identity to the provenance record, and the consumer-side verification procedure.

The scope extends to embodiments that vary the generator-identifier probe set, the granularity of input attribution, the signing arrangement, the handling of streaming outputs, and the integration with post-generation transformations, provided that the per-output record is sealed against tampering and that verification is performable without access to the generator's weights or training logs. Specific generator architectures, training procedures, and domain applications are out of scope except as exemplars.

Claims arising from the disclosure cover the structural arrangement and its enforcement consequences, including the tamper-evidence of the provenance record, the verifiability of input and generator attributions independent of operator infrastructure, and the composability of output provenance with atomic anchoring and composite lineage. Implementations practicing one or more of these features in combination fall within the claim scope regardless of the modality of the generator or the domain of the consumer.

Nick Clark Invented by Nick Clark Founding Investors:
Anonymous, Devin Wilkie
72 28 14 36 01