Mechanism
Training-level memorization detection identifies when model output at inference time exhibits high similarity to a training artifact, and then determines exactly where and how deeply the similar training content was integrated into the model's parameters. The mechanism does not estimate memorization by probing a finished model. It reads the record that the training loop already produced. As disclosed in the preceding sections of the chapter, the semantic execution substrate governs the training loop by treating each training example as a proposed semantic mutation, assigning it a depth aggregation profile, and recording the outcome in a training provenance log. Memorization detection is the query side of that infrastructure: when output is flagged as similar to a known artifact, the system runs a reverse provenance query against the log to recover how that artifact was integrated.
The detection is therefore structural rather than statistical. Because the depth-selective aggregation mechanism applied a per-block contribution weight vector to each training example's gradient signal, the log holds an exact account of which layer blocks each example was permitted to influence and at what magnitude. Memorization detection consults that account rather than reconstructing it after the fact.
The Reverse Provenance Query
Detection begins when model output at inference time is flagged as exhibiting high similarity to a known training artifact. The flag may originate from the rights-grade governance layer that governs inference, from an external content identification service, or from a human reviewer. On the flag, the memorization detection module initiates a reverse provenance query against the training provenance log. The query identifies the training examples that correspond to the flagged artifact and retrieves their stored records.
The retrieved records are the structured entries that the substrate wrote during training: the depth aggregation profile that was applied to the example's gradient signal, the per-layer contribution weights that record the actual gradient magnitude that reached each layer block after depth-selective modulation, the entropy band and policy scope under which the content was admitted, and the governance record identifying the policy objects that authorized admission and determined the depth profile. From these records the system determines which layer blocks the similar content was permitted to influence during training, the magnitude of the gradient signal that reached each influenced block, and whether the content was admitted under a suppressed depth profile or a full-depth profile.
Shallow, Deep, or Absent Memorization
The memorization detection module produces a structured memorization assessment that classifies the similarity into one of three categories. The first is shallow memorization: the similar content was trained with a suppressed depth profile that confined its influence to the model's shallower layers. Shallow memorization indicates that the similarity is a consequence of shallow pattern matching, that is, lexical, syntactic, or local structural similarity, rather than deep conceptual encoding. It is the expected outcome when time-limited or rights-restricted content is properly governed during training.
The second category is deep memorization: the similar content was trained with a full-depth or deep-weighted profile, so its influence extends into the model's deeper layers where abstract representations and conceptual structures are encoded. Deep memorization may indicate that the content was deeply integrated because it was freely licensed and the deep integration was policy-compliant, or it may indicate a governance failure in which content that should have been depth-restricted was inadvertently trained with a full-depth profile.
The third category is absent memorization: the training provenance log contains no record of the similar content. This indicates that the model's similarity to the artifact is not a consequence of direct training on the artifact, but may instead arise from training on other content that shares structural patterns with it. The three-way classification is the assessment's entire output. The system reports which of the three states obtains, grounded in the depth profile records, rather than emitting a continuous score.
Feedback Into Inference-Time Governance
The structured assessment is reported to the rights-grade governance layer at inference time, allowing the inference-time substrate to incorporate training-time provenance into its admissibility determination. When the inference substrate identifies output similar to a known training artifact, it queries the memorization detection module for the training-level assessment and acts on the returned category.
If the assessment indicates shallow memorization of properly governed content, the inference substrate may permit the output with an attribution annotation. If the assessment indicates deep memorization of content that should have been depth-restricted, the substrate may suppress the output and generate a governance alert. If the assessment indicates absent memorization, the substrate treats the similarity as coincidental and applies standard admissibility evaluation. The detection module thus does not act on the model directly; it supplies the inference-time gate with the provenance context needed to render the right admissibility decision.
The Provenance Log That Makes Detection Possible
Memorization detection is only as good as the log it queries. The training provenance log records, for each training batch or example, the entropy band classification, the slope position, the depth aggregation profile comprising the per-block contribution weight vector applied to the gradient signal, the per-layer contribution weight recording the gradient magnitude that actually reached each block, the governance record identifying the authorizing and depth-determining policy objects, the content provenance record, and an admissibility determination record indicating whether the example was admitted, rejected, or admitted with a modified depth profile.
The log is chronologically ordered and append-only. Each entry is timestamped, sequentially numbered, and annotated with the training epoch, iteration, and batch index. The append-only structure makes the log tamper-resistant: entries cannot be retroactively modified, deleted, or reordered without producing detectable inconsistencies in the numbering and timestamp sequence. The log may be periodically sealed using the cryptographic sealing infrastructure of the cross-referenced governance disclosure, producing tamper-evident checkpoints for third-party verification. Reverse queries against this log do not definitively attribute model behavior to specific training content, since the non-linear dynamics of gradient-based optimization preclude exact attribution, but they identify the bounded set of content that was structurally permitted to influence the relevant layer blocks, a set substantially narrower than the full training corpus.
Detection Built on Structural Prevention
The premise that makes shallow versus deep classification meaningful is that depth restriction was enforced structurally during training, not approximated afterward. Content admitted under time-limited licensing is trained with a suppressed depth profile whose deep-layer contribution weights are set to zero or near-zero, confining the content's influence to shallow layers. Content from the governed exclusion corpus receives a zero-weight depth profile that sets the contribution weight to zero at every layer, preventing the example from influencing any parameter. These determinations are recorded in the log as governed events.
This distinguishes the approach from post-hoc unlearning. Post-hoc unlearning operates after training: it identifies content that should not have been learned, approximates that content's diffuse influence across the model's parameters, and applies corrective updates to attenuate it. Because a single example's influence is diffused across millions or billions of parameters through non-linear optimization dynamics, the approximation is inherently imprecise. The present disclosure does not unlearn. It prevents: content whose governance profile restricts deep integration is kept out of the deep layers at training time, before the gradient signal reaches them. There is no need to unlearn what was never deeply learned, and a memorization assessment of shallow memorization for properly governed content reflects a confinement that was applied exactly rather than estimated.
Composition With Other Primitives
Memorization detection composes with the content provenance and content anchoring infrastructure that populates the log. When training content carries a verified anchored identity derived from its own structural entropy, its provenance record is enriched with the anchor identity, enabling reverse queries to trace a model capability back to specific anchored content regardless of how the content was acquired or transformed before training. Content without a verified anchored identity is flagged in the log as provenance-incomplete, and governance policy may restrict its depth profile to shallow layers, so that content of unverifiable origin cannot be deeply integrated in the first place.
Detection also supports compliance auditing. When a content owner asks whether their content was used in training, the log answers definitively: either the content was present, with its provenance record, depth profile, and contribution weights available, or it was absent. When a regulatory authority requires evidence that restricted content was not deeply integrated, the depth profile records show the contribution weights that were applied. The memorization classification of a flagged output then situates a specific inference-time similarity within this audit record, distinguishing properly governed shallow retention from a deep-integration governance failure.
Disclosure Scope
Training-level memorization detection, comprising the reverse provenance query initiated against the training provenance log when inference output is flagged as similar to a known training artifact, the retrieval of the depth aggregation profile, per-layer contribution weights, entropy band, policy scope, and governance records for the corresponding training examples, the structured memorization assessment classifying the similarity as shallow memorization, deep memorization, or absent memorization, and the reporting of that assessment to the rights-grade inference-time governance layer to enrich its admissibility determination, is disclosed in the cognition filing (U.S. Application No. 19/647,395 and its international counterpart) at Section 11.7, together with the supporting provenance-traceable training dynamics, policy-governed suppression, and inference-time integration disclosed at Sections 11.5, 11.6, and 11.10. This article describes that disclosed mechanism. The scope extends to embodiments in which the flag originates from the rights-grade governance layer, an external content identification service, or a human reviewer, and to embodiments in which the depth profiles consulted by the assessment were enforced through gated residual connections, attention-based depth selection, or layer-specific scaling factors, provided the assessment is derived from the recorded depth profile and contribution weights rather than from post-hoc estimation of a finished model.