Semantic Lineage Recording
by Nick Clark | Published March 27, 2026
Each inference call performed under governed inference produces a lineage record - a structured artefact naming the inputs the call consumed, the model and version that produced it, the outputs it returned, the governance class under which it was admitted, and the capability tier of the agent that requested it. The record is anchored to a substrate that makes subsequent tampering detectable: any alteration of an entry, any reordering of entries, any silent deletion is recoverable by inspection. The lineage is not a debug log retained at the operator's discretion; it is a first-class structural artefact of the inference itself, and an inference that produces no lineage is, by construction, not a governed inference. This article describes the structural mechanics of lineage recording, the parameters that govern record content and retention, alternative embodiments of the anchoring substrate, composition with the rest of the inference-control stack, prior-art distinctions, and the breadth of the disclosure.
Mechanism
The lineage subsystem sits structurally adjacent to the admissibility gate. When the gate admits a candidate inference step, the engine emits the resulting transition together with a lineage record. The record is composed deterministically: it names the canonical input digest (a hash of the prompt, the retrieved context, and any tool outputs consumed by the step), the model identifier and exact version, the output digest, the governance class assigned by the policy at the moment of admission, and the capability tier and intent declared by the requesting agent. A monotonic sequence number and a wall-clock timestamp anchor the record in time, and a back-reference to the immediately preceding record in the same agent's lineage anchors it in causal order.
Rejected candidates produce rejection records of the same shape, with the output field replaced by a structured reason and the governance class replaced by the rejection class. Decomposed candidates produce a parent record naming the original candidate and child records for each decomposition product, allowing a reader to reconstruct exactly how a complex inference was broken down.
Records are anchored to the substrate by a tamper-evident structure - typically a hash chain in which each record's digest includes the digest of its predecessor, optionally rolled up into a Merkle root that is committed to an external witness at a configurable cadence. The anchoring discipline is what distinguishes the lineage from an ordinary log: a reader presented with a contiguous range of records and the corresponding root commitment can verify, without trusting the operator, that no record has been altered, inserted, or removed.
The records are append-only at the structural level. The subsystem provides no in-place mutation primitive. Corrections, retractions, and superseding decisions are themselves recorded as new entries that reference the entries they supersede, leaving the original entry intact and the supersession traceable. This is not a policy decision that can be relaxed; it is the structural property that makes the lineage a useful audit artefact.
The recording discipline is synchronous with respect to semantic-state commitment. The subsystem will not advance semantic state until the record corresponding to the admitting transition has been durably accepted by the anchoring substrate; if anchoring fails, the transition is rolled back and re-emitted as a rejection event whose reason names the anchoring failure. This synchronous coupling is what guarantees the invariant that no inference can affect committed state without leaving a verifiable trace, and it is the structural difference between lineage in the present sense and any best-effort logging discipline that can be lost without consequence.
Operating Parameters
The record schema is the first parameter. A deployment selects which fields beyond the mandatory core (input digest, model identifier, output digest, governance class, capability tier, sequence number, predecessor reference) are recorded: full input plaintext or only its digest, full output plaintext or only its digest, intermediate scratch state, attention or activation summaries, tool-invocation traces. Recording more produces stronger reproducibility at the cost of storage and exfiltration risk; recording less reduces both.
The anchoring cadence determines how often the rolling hash root is committed to an external witness, and consequently the maximum number of records that could be silently rewritten by a sufficiently capable adversary before the rewrite would be detectable. Cadences range from per-record, in high-assurance deployments, to per-batch or per-epoch, in throughput-bound deployments. The witness substrate is itself a parameter: a tenant-internal append-only store, a cross-tenant transparency log, or an external blockchain are all contemplated.
Retention horizon governs how long records are retained before being summarised or expired; redaction policy governs what fields, if any, may be redacted from a retained record under what authority, with the redaction itself producing a recorded supersession. Access-control policy governs who may read which records and is enforced at the substrate, not at a perimeter. Together these parameters allow the same recording mechanism to serve compliance, debugging, scientific reproducibility, and post-incident forensics without architectural changes.
A further parameter governs the binding between record content and identity. Each record is signed by the inference engine that produced it under a key bound to the engine's deployment identity, so that a reader can distinguish records produced by an attested engine from records that merely claim to have been produced by one. The identity-binding parameter selects which signing authority is in force for a given deployment - a tenant-managed key, a platform-managed key, or a hardware-rooted attestation key - and the choice is itself recorded at deployment time so that the trust chain underlying any retrieved record is explicit rather than inferred.
Alternative Embodiments
The simplest embodiment writes records to a local append-only file with a per-process hash chain and a periodic root commitment to a tenant-internal witness. A clustered embodiment shards the chain across multiple writers under a sequencer that assigns sequence numbers and merges per-writer chains into a single linearised stream. A federated embodiment runs an independent chain per administrative domain and links them through cross-chain commitments, so that a regulator can verify the lineage of a cross-domain inference without any one domain being able to forge it.
A confidential-compute embodiment runs the recording subsystem inside a trusted execution environment, with the chain root attested by the enclave so that even the operator cannot rewrite history. A privacy-preserving embodiment records only digests on the public chain and stores the corresponding plaintexts in a per-tenant encrypted store, allowing third-party verification of integrity without third-party access to content. A streaming embodiment emits records to a downstream audit consumer in real time, with the consumer maintaining its own anchored copy as a hot replica.
A summarising embodiment compresses the lineage of long-running agents by replacing dense per-step records with cryptographic commitments to their content plus a sparse set of representative records, allowing storage cost to grow sub-linearly with inference volume while preserving the ability to reconstruct any individual step on demand from the original engine. A cross-tenant embodiment maintains a shared anchoring chain across multiple tenants under a neutral operator, with per-tenant access controls applied at read time; this form is structurally appropriate for industry consortia in which member organisations require mutual auditability without granting one another open access to their inferences.
Composition with the Inference-Control Stack
Lineage recording composes tightly with the admissibility gate, which is its sole upstream producer of structured events; with the semantic-state manager, whose state digests appear in records and whose state transitions are gated on successful record commitment; with the policy compiler, whose governance classes are the controlled vocabulary that records reference; and with the capability-tier registry, whose tiers and intents are similarly referenced. The commitment-token from the resource-negotiation protocol appears in the record as the authority under which the inference consumed its resources, linking the inference-control lineage to the capability-awareness lineage in a single auditable graph.
Downstream, the lineage is consumed by the post-flight audit embodiment of the inference-control gate, by compliance reporting pipelines, by debugging and replay tooling, and by policy-refinement processes that mine rejection records for systematic gaps in policy coverage. The recording subsystem is the joint between live inference and every form of after-the-fact reasoning about that inference.
Prior-Art Distinctions
Conventional ML observability systems record inference inputs and outputs in a logging pipeline whose integrity properties are those of the underlying log infrastructure: at best, the log is append-only at the application level but mutable at the storage level, and silent deletion is undetectable to a reader. The mechanism described here differs structurally in that the lineage chain itself carries the integrity proof, independent of the storage layer's trust properties.
Provenance systems for data pipelines and scientific workflows share the structured-record discipline but are typically not anchored to a tamper-evident substrate, do not record governance class as a first-class field, and do not distinguish constructive transitions from rejection events. Transparency logs in the certificate-authority ecosystem share the anchoring discipline but record certificate issuances, not inference steps, and have no notion of capability tier or governance class. The combination of per-step inference recording, governance-class and capability-tier annotation, structural distinction between admitted and rejected events, and tamper-evident anchoring is the structural contribution of the present mechanism.
Disclosure Scope
The cognition patent discloses semantic lineage recording as a structural primitive applicable to any inference pipeline operated under governed-inference discipline. The disclosure expressly contemplates that the recorded unit may be a single token transition, a multi-token segment, a complete model invocation, or a multi-call agentic plan, and that the same anchoring discipline applies at every granularity.
The scope extends to embodiments in which records are produced by inference systems operated by third parties and submitted to a tenant-controlled lineage substrate; to embodiments in which the lineage is itself the input to a downstream inference (a reflective embodiment); and to embodiments in which lineage records cross organisational boundaries under structured access-control and redaction policies. Across all of these embodiments the invariants are the same: structured record schema with mandatory core fields, append-only structural discipline, tamper-evident anchoring, and explicit recording of admitted, rejected, and decomposed events as distinct first-class entry types.
The disclosure further contemplates application to inference performed by non-language models, including diffusion image and audio generators, classifier ensembles, retrieval rerankers, and policy-learning agents. In each case the recorded fields adapt to the modality but the structural discipline is identical: digest of input, identifier of model, digest of output, governance class, capability tier, predecessor reference, sequence number, anchored hash. Application to hybrid pipelines, in which a single user-visible response is composed from many inferences across many models, is contemplated through composite records that name each constituent inference's lineage entry and a composition-rule reference, allowing a reviewer to descend from the response to any individual sub-inference and verify it independently. The disclosure thus treats lineage recording not as a logging convenience but as the structural fabric on which any claim of governed inference must rest.