Depth-Selective Training Governance for Machine Learning Systems

Nick Clark

Depth-Selective Training Governance for Machine Learning Systems

by Nick Clark | Published March 26, 2026 | PDF

Contemporary machine learning training pipelines treat all training content as uniformly integratable: every example contributes gradients to every parameter at every depth, with no governance over how deeply any specific datum shapes the resulting model. There is no per-layer admissibility, no per-source provenance binding, and no event-level lineage tying a specific weight delta back to the example that produced it. Once training completes, the act of training has destroyed exactly the information rights holders, regulators, and downstream operators need most. This article presents a depth-selective training governance architecture that evaluates each candidate example against semantic metadata and rights classification, assigns an entropy-indexed depth profile, and routes gradients to specific model layers through per-layer weighting vectors enforced inside the optimizer. The architecture binds skill credentials, source attestations, and policy bands to the parameters they shape, producing a training event ledger that is queryable in both directions. Rights compliance, memorization risk, and knowledge formation depth become structural properties of the training loop rather than retroactive audits or post-hoc filters layered over an opaque pretraining run.

1. Problem and architectural premise

Every commercial foundation model is trained on content whose provenance is uncertain, whose rights status is contested, and whose influence on subsequent model behavior is structurally untraceable. A pretraining corpus typically contains licensed data, openly licensed data, scraped public data, rights-restricted data, and a long tail of content whose status is simply unknown. The optimizer treats all of it identically. Gradients computed from a paywalled news article and gradients computed from a public domain text flow through the same backward pass, modify the same parameters, and become indistinguishable inside the resulting weights.

The downstream consequences are not academic. Once training completes, no internal mechanism exists to determine which example influenced which parameter, at what depth, or with what magnitude. Standard training is a one-way function over provenance: the act of optimization destroys the very lineage information rights holders, regulators, and customers increasingly demand. Model cards, dataset documentation, and content-level filters operate at the corpus boundary; they say nothing about what happened inside the parameters. Post-hoc influence functions, membership-inference probes, and data-attribution heuristics are expensive, approximate, and typically restricted to small perturbations of small models.

The architectural premise of depth-selective training governance is that this is not a documentation problem. It is a control-flow problem. If the training loop is to produce a model whose knowledge formation is governed, then governance must be applied where knowledge is actually formed: inside the gradient computation, at the granularity of layers, and at the time the parameters are written. Treating training as an opaque box and bolting documentation around it preserves the very property that needs to be eliminated — the uniform, indiscriminate integration of every example into every parameter.

The premise has a second, equally important component. Depth in a deep network is not a uniform coordinate; it is a functional stratification. Layers at different depths encode different kinds of knowledge, change at different rates, and influence downstream computation differently. Once depth is recognized as a governable dimension rather than a fixed substrate, it becomes possible to route specific kinds of content to specific kinds of parameters, to attest which skills were certified at which depths, and to enforce rights bands as structural constraints on the optimizer rather than as advisory metadata.

2. The core architectural primitive: depth as a governed coordinate

The core primitive is the depth profile. A depth profile is a per-layer weighting vector w = (w_1, w_2, ..., w_L) with one entry per trainable layer of the target network, where each entry specifies the fraction of the example's loss-derived gradient that is permitted to modify parameters in that layer. Entries are typically constrained to [0, 1], although signed values supporting suppression and unlearning are part of the disclosure. A profile of all ones reproduces conventional unconstrained training; a profile that is non-zero only over a contiguous shallow band confines an example to surface adaptation; a profile concentrated in middle layers shapes behavioral tendency without disturbing foundational representation; a profile of all zeros admits the example for attestation purposes only, recording its presence in the training event ledger without allowing it to modify any weight.

Depth profiles are not chosen by hand. They are produced by a governance evaluator that consumes three inputs: the example's semantic metadata (content type, topic, source, creation date, modality), its rights classification (licensed, exclusion-listed, time-limited, jurisdictionally restricted, public domain, derivative), and its credentialed skill signature (which capabilities the example is intended to reinforce, and which credentials the source holds for contributing to those capabilities). The evaluator maps the joint signature into a depth band drawn from a small, declared set of bands — for example, a "lexical surface" band over the top 10–15% of layers, a "behavioral middle" band over the central 40–60%, and a "foundational deep" band over the bottom 15–25% — with smooth, optionally entropy-indexed falloff at band boundaries.

Two properties make this primitive different from conventional layer-wise learning rate schedules or LoRA-style adapter scoping. First, the profile is bound to the example, not to the optimizer step or to a global schedule: two examples in the same minibatch may carry different profiles and modify disjoint depth bands. Second, the profile is governance-derived and recorded, not engineering-tuned: the evaluator's decision, the inputs that produced it, and the resulting per-layer weights are written to a tamper-evident training event ledger keyed by example identifier, source attestation, and parameter checkpoint.

Depth profiles compose. A minibatch's effective per-layer update is the profile-weighted sum of its example gradients, which means rights bands aggregate naturally across batches, sources aggregate naturally across the corpus, and skill credentials aggregate naturally across the training run. The optimizer continues to operate as a standard first- or second-order method; only the gradients it consumes are governance-shaped before they arrive.

3. Credentialed contribution attestation and per-source provenance

Before any example reaches the depth-profile evaluator, it passes through a contribution attestation gate. A contribution attestation is a signed statement, anchored to a registered source identity, declaring that the source holds rights sufficient to contribute the example for training under a stated set of bands and skills. Attestations are typed: a public-domain attestation may admit any band; a commercial-license attestation may admit only middle and surface bands and only for declared skills; a synthetic-data attestation may carry a generator identity and an upstream attestation chain. Examples without a valid attestation either fail closed (rejected) or fall back to a degraded attestation-of-last-resort band that is typically empty or surface-only.

Each accepted example is assigned a stable content-addressed identifier, hashed against the source attestation and the active policy version. The identifier travels with the example through tokenization, batching, gradient computation, and optimizer update. When the optimizer applies the profile-weighted gradient, it emits a training event record: example identifier, attestation chain, governance evaluator version, depth profile, per-layer parameter delta norms, optimizer step number, and resulting checkpoint hash. The records form an append-only ledger from which both forward queries (which parameters did this source shape?) and reverse queries (which sources shaped this parameter band?) can be answered without re-running training.

Per-source provenance is therefore a structural property of the trained weights, not a corpus-level claim. A licensor who later disputes a usage can be answered with the exact depth bands, parameter norms, and event timestamps attributable to their content. Conversely, a regulator inquiring whether a specific exclusion list was honored can verify it directly against the ledger rather than against a sampled audit of inputs.

4. Depth-selective gradient gating and training-inference integration

Depth-selective gradient gating is the optimizer-level enforcement of the depth profile. Concretely, the backward pass is instrumented so that the gradient tensor produced for each layer is multiplied element-wise (or block-wise, for grouped parameter classes) by the corresponding profile entry before being passed to the optimizer's update rule. Under standard SGD, this is equivalent to a per-example, per-layer learning-rate mask; under adaptive optimizers, the masking is applied to the gradient before the running statistics are updated, so that masked layers are not poisoned by suppressed examples through the second-moment buffer. Layers at zero profile receive no gradient and contribute nothing to optimizer state for that example.

The gating mechanism is paired with a training–inference integration contract. Skills that were certified during training — meaning the example set that shaped a given depth band satisfied a stated credential and policy — carry an inference-time admissibility token. At inference, the cognition runtime can require, for a given query class, that the responsible skill chain was trained under bands and credentials consistent with the invocation context. A capability whose deep-layer foundation was shaped only by attested, in-policy content is admissible in policy-restricted deployments; a capability whose deep band ledger contains unverified content can be inference-gated, scoped to shallower execution, or refused. This is the sense in which training governance projects forward into runtime: the same ledger that justifies the weights also conditions their use.

Because the profile is recorded per example and per layer, the integration contract is checkable at fine resolution. An audit can answer whether the parameters supporting a specific skill were ever modified by a specific source, at any point in the training history, at any depth, and the answer is exact rather than statistical.

5. Memorization detection and regulated-autonomy training

Memorization is detected at training time, not after deployment. The governance loop monitors the per-example, per-layer gradient norm and the alignment between the example's gradient and the running parameter delta. When the gradient contribution from a single example to a small set of parameters exceeds an entropy-indexed threshold — indicating that the optimizer is encoding the example rather than learning a generalizing pattern — the event is flagged. The flagged example may be excluded from the remainder of the run, its profile may be restricted to shallower bands, its gradient magnitude may be attenuated, or its update may be replaced with an explicit anti-memorization update that drives the parameter delta back toward a smoothed neighborhood. Each response is itself a governed training event recorded in the ledger.

Memorization detection composes with the rights system. An example whose attestation permits middle-band contribution but whose observed gradient signature is concentrated in a few deep parameters is treated as a policy violation, not merely a generalization failure. The same machinery that prevents over-encoding of any specific example prevents silent deep-band leakage of restricted content.

Regulated-autonomy training extends the loop to the case where the model itself proposes data for inclusion: self-generated examples, retrieved augmentations, or mutation proposals derived from observed inference outcomes. Such proposals enter the same attestation gate, but with a generator identity, a derivation chain, and a regulated band that is typically narrower than the bands available to attested external content. The system is permitted to learn from its own operation, but only at depths and under credentials that policy admits, and only with full lineage to the governed events that produced the proposal.

6. Operating parameters and engineering envelope

The architecture is parameterized along several axes. Depth band cardinality is typically three to seven bands per network, with surface bands occupying the top 8–20% of layers, behavioral bands occupying 30–60%, and foundational bands occupying the bottom 10–25%; band edges may be fixed by layer index, by cumulative parameter count, or by a learned functional partition. Per-layer profile weights are usually quantized to 8–16 levels for ledger compactness, with continuous interior representation during the optimizer step.

Memorization thresholds are entropy-indexed: the per-example, per-parameter gradient magnitude is normalized against a running estimate of the layer's gradient entropy, and flags fire when the normalized score exceeds a band-specific cutoff, typically in the range of 4–8 standard deviations above the rolling mean for surface bands and 2–4 for deep bands, reflecting the lower tolerance for deep-band over-encoding. Attestation gate latency is amortized by caching evaluator decisions per source-policy pair; in steady state the per-example governance overhead is dominated by ledger writes rather than policy evaluation.

Throughput overhead from gradient gating is modest. Element-wise per-layer masking adds a single fused multiply per parameter group per backward pass, on the order of single-digit percent overhead at typical batch sizes. Ledger writes are batched per optimizer step and stream to an append-only store sized to the corpus; for a multi-billion example pretraining run the ledger is on the order of low terabytes when records are content-addressed and de-duplicated against attestation chains. Recovery from a governance-aborted run uses the ledger plus the most recent governed checkpoint, with no requirement to replay attestation evaluation against the entire corpus.

7. Alternative embodiments

The architecture admits several embodiments beyond dense transformer pretraining. In a parameter-efficient embodiment, depth profiles select among adapter modules (LoRA, IA³, prefix banks) rather than among native layers, with each adapter tagged to a band and a credential. The mathematics are unchanged: profile weights gate adapter gradients; the ledger records adapter-level events. This embodiment is attractive for governed continual learning, where a tenant's content modifies only that tenant's adapters and never the shared backbone.

In a mixture-of-experts embodiment, depth profiles are augmented by expert profiles: an example is routed not only to a depth band but to a permitted subset of experts. Rights-restricted content can be confined to clearly delineated expert routes, simplifying both audit and surgical removal. In a retrieval-augmented embodiment, the same governance evaluator gates whether a retrieved passage may flow into a parameter update at all (versus merely conditioning a forward pass), with the typical answer being a narrow surface-band profile or a profile of zeros.

Unlearning is a natural dual embodiment: a signed profile with negative entries, applied to the example at the targeted bands, drives the parameters away from the example's encoded direction; the ledger records the unlearning event with the same fidelity as the original incorporation, and skill credentials are recomputed against the post-unlearning band. Federated and split-learning embodiments push the evaluator to the data owner, who emits attestations and band-restricted gradients to a central aggregator that never sees the underlying examples; the ledger is then a federation of per-owner sub-ledgers anchored to a shared checkpoint timeline.

8. Composition with the broader cognition architecture

Depth-selective training governance does not stand alone. It is the training-time face of a broader cognition substrate in which semantic objects, policy-bound execution, and governed memory share a common attestation and lineage discipline. The training event ledger is structurally compatible with the runtime execution ledger: skill credentials certified during training are consumed by the runtime admissibility evaluator that governs inference; provenance identifiers used to tag training examples are the same identifiers used to tag retrieved content, tool invocations, and memory-resident objects at execution time.

Composed with memory-resident execution, depth-selective training enables an end-to-end governance arc: a semantic object carrying a specific policy band can require, at execution time, that any model invoked on its behalf was trained under a compatible band, and the runtime can verify this against the training ledger without trusting an external attestation. Composed with adaptive indexing, the source-keyed provenance of training events allows the index to surface the training-time origin of any inference-time behavior, closing the loop between what the system learned and what the system does.

The architecture is also compositional with respect to model lifecycle. Fine-tuning, domain adaptation, instruction tuning, and reinforcement-learning post-training each consume the same depth-profile and attestation primitives, recorded in the same ledger. A model's full training history — pretraining, every fine-tune, every alignment update — is a single, queryable structure. This is the substrate property that distinguishes governed knowledge formation from a sequence of separately-documented training stages.

Composition extends to mutation proposals: when an execution-time semantic object proposes an update to model behavior — for example, a corrected response, a refined skill, or a removed capability — the proposal enters the training governance pipeline as a candidate example with a derivation chain anchored in the runtime ledger. The training event ledger and the runtime execution ledger are therefore not parallel logs; they are two halves of a single lineage graph in which every parameter delta points to the events that produced it and every execution event points to the parameter bands that admitted it. The cognition substrate's claim to governed knowledge formation rests on this graph being closed at both ends rather than on documentation surrounding either end in isolation.

9. Prior-art distinctions

The architecture is distinct from several adjacent approaches. Post-hoc content filters and dataset documentation operate at the corpus boundary and provide no parameter-level governance; they cannot answer the per-layer, per-source questions the ledger answers structurally. Differential privacy mechanisms bound the influence of any single example on the model uniformly across all parameters; they neither express nor enforce depth structure, nor do they record per-source lineage of the surviving influence. Layer-wise learning-rate schedules and discriminative fine-tuning select learning rates by layer globally per training stage; they do not vary per example, do not bind to attestations, and do not produce a queryable event ledger.

Reward-model-based alignment and RLHF reward-hacking detection operate over reward signals at fine-tune time and do not govern depth of integration during pretraining; they detect optimization pathologies but do not constrain which content may shape which parameters. Adapter-only training (LoRA, prefix tuning) restricts updates to a small module but does not, by itself, encode rights bands, credentialed contributions, or event-level lineage. Influence functions and TracIn-style attribution are post-training, approximate, and computationally intensive; they reconstruct estimates of what the ledger records exactly and prospectively.

The novel combination disclosed here is the binding of (a) per-example, per-layer gradient routing, (b) credentialed source attestations and skill signatures, (c) an append-only training event ledger, and (d) a runtime admissibility contract that consumes the ledger to gate inference. No prior approach provides all four at once, and the four together are what convert training from an opaque transformation into a governed execution environment.

10. Disclosure scope

This article describes structural mechanisms of a depth-selective training governance architecture: the depth profile primitive, the attestation gate, the gradient gating implementation, the training event ledger, the memorization-detection loop, and the training–inference integration contract, together with their alternative embodiments and compositions. The disclosure is at the level of architecture and method, not of any specific deployment. Specific layer-band assignments, attestation schemas, ledger encodings, threshold values, and integration policies are implementation parameters chosen by an operator under their own regulatory and engineering constraints.

Statements about overhead, throughput, and storage are characteristic ranges intended to communicate that the architecture is engineering-feasible at modern training scales, not performance guarantees for any specific system. Claims of rights compliance, regulatory sufficiency, or memorization elimination depend on operator policy, attestation quality, and verification practice; the architecture provides the structural substrate on which such claims can be made and audited, not the claims themselves.