Regulated Industry Model Governance With Provenance

Nick Clark

Regulated Industry Model Governance With Provenance

by Nick Clark | Published March 27, 2026 | PDF

Regulated institutions deploying AI face a convergence of supervisory expectations that current training practices cannot satisfy. The Federal Reserve's SR 11-7 and OCC 12 CFR Part 30 demand documented understanding of every model's data, assumptions, and limitations. The EU AI Act binds high-risk providers to Article 9 risk management, Article 17 quality management, and Article 18 record-keeping obligations that cannot be discharged by narrative documentation alone. SR 21-13 raises the bar for ongoing monitoring; the EBA's machine-learning discussion paper and the NAIC 2023 AI Bulletin extend analogous expectations into European banking and U.S. insurance. Training governance with structural provenance tracing supplies what these regimes actually require: a verifiable chain from training data through gradient updates to model parameters, sustained across the lifecycle, and auditable independently of the institution's own attestations.

Regulatory framework

SR 11-7, jointly issued by the Federal Reserve and the OCC, frames model risk as a function of conceptual soundness, ongoing monitoring, and outcomes analysis, and obliges firms to maintain documentation sufficient for an independent reviewer to understand the model without consulting its developers. OCC 12 CFR Part 30 codifies governance expectations into heightened standards. SR 21-13 sharpens the monitoring obligation by demanding that firms detect deterioration in real operating conditions, not merely at validation. The EU AI Act, applicable to AI systems placed on the EU market regardless of provider domicile, requires under Article 9 that risk management be a continuous iterative process across the system's lifecycle; Article 17 mandates a quality management system with documented procedures for data management and change control; Article 18 sets retention obligations for the technical documentation, automatically generated logs, and post-market monitoring evidence.

The NIST AI Risk Management Framework provides the lingua franca that supervisors increasingly use to evaluate institutional programs, and ISO/IEC 42001 supplies the certifiable management-system structure that auditors expect to see implemented. The EBA discussion paper on machine learning in IRB models and the NAIC 2023 Bulletin on the use of AI by insurers extend analogous principles into prudential and conduct supervision. Across all of these, the underlying demand is identical: training, validation, and monitoring must be structurally evidenced, not merely described.

Architectural requirement

The boards that ultimately accept this evidence are the institution's model risk management function, its second-line validation team, the external auditor reviewing financial-statement-relevant models, and, increasingly, the supervisor exercising direct examination authority. Each operates from a different position but converges on the same question: can the institution demonstrate, with artifacts the institution did not author after the fact, that the model in production is the model the controls were designed for. The training infrastructure is the only place that question can be answered honestly.

Satisfying these regimes requires that the training pipeline emit, as a primary output alongside the model itself, a tamper-evident lineage record that links each parameter update to its data origin, governance decisions, validation status, and depth of integration. The record must be queryable by an independent reviewer without re-running training. It must persist for the retention horizon imposed by Article 18 and by the institution's own model-risk policy. It must support the change-control discipline that SR 11-7 and ISO/IEC 42001 require, so that any subsequent fine-tune, retraining cycle, or data correction is provably scoped and provably non-destructive of previously validated capabilities. And it must be generated as a structural property of the training loop, because any artifact produced outside the loop is, by definition, separable from the model and therefore unverifiable.

Why procedural compliance fails

The current dominant artifacts of model documentation are model cards and dataset datasheets. Both are useful as communication tools and inadequate as compliance tools. A model card describes intended use, performance, and limitations; a datasheet describes corpus composition and collection. Neither is structurally bound to the model. A model card can be inaccurate, out of date, or simply wrong about what the deployed weights actually encode. A datasheet can omit ingest events that occurred late in training. Neither admits independent verification: a regulator reading a model card has no mechanism to confirm that the card describes the binary in production.

Validation reports written under SR 11-7 inherit the same fragility when the underlying training process is opaque. Effective challenge requires that the challenger be able to interrogate the training lineage; if the lineage is reconstructed from emails, notebooks, and developer recollection, the challenge is degraded to a review of narrative. Fine-tuning compounds the problem: a fine-tuned model inherits the provenance gaps of its base, and adds its own. Under EU AI Act Article 17, a quality management system that cannot demonstrate which data influenced which capability is a system that cannot evidence its own change controls. Under SR 21-13, ongoing monitoring that lacks a baseline tied to specific training inputs cannot distinguish drift from undocumented retraining. Procedural compliance reaches its limits at the point where the regulator asks not what the institution intends but what the model in production actually contains.

What the AQ primitive provides

Adaptive Query training governance maintains structural provenance throughout the training process. Every training example's contribution is recorded at the gradient level: which input entered the loop, what governance profile it carried, which layers received its updates, and how its influence integrated with the rest of the corpus. Entropy-depth profiles characterize the structural manner in which training data shaped the model at each layer band, allowing a validator to distinguish deep, broadly integrated competence from shallow surface mimicry. This is the diagnostic that effective challenge under SR 11-7 has historically lacked.

The governed training loop ensures that every step complies with policy as a precondition of execution. Data that an institutional policy excludes from a given domain is excluded structurally rather than by post-hoc filter, and the exclusion event is logged as part of the lineage. Fine-tuning provenance tracks the precise inputs of each fine-tune, supporting the change-control requirements of EU AI Act Article 17 and the model-update controls of SR 11-7. Knowledge-retention monitoring detects when training degrades a previously validated capability and can gate further updates against regression, addressing the catastrophic-forgetting failure mode that has caused regulated firms to revert to older model versions after fine-tuning campaigns. Risk-tiered routing aligns with EU AI Act Article 9 by treating high-risk training inputs with proportionate scrutiny inside the loop, rather than only at deployment. Together these primitives convert documentation from an artifact the institution authors into evidence the training system emits.

The economic argument for moving to structural provenance tracks the trajectory of supervisory examinations. Each cycle of regulatory engagement consumes engineering and legal capacity disproportionate to the questions actually being asked, because the institution is reconstructing the answers rather than retrieving them. A training pipeline that emits its own evidence collapses the marginal cost of each subsequent examination, each new fine-tune, and each new product launch onto the same substrate, which is the only way the rate of model deployment in regulated firms becomes sustainable under the rate at which supervisory expectations are tightening.

Compliance mapping

Provenance tracing satisfies SR 11-7's expectation that the institution understand and document data, assumptions, and limitations, and supplies the independent-review surface that effective challenge requires. The governed training loop and risk-tiered routing implement EU AI Act Article 9 risk management as a continuous lifecycle process rather than a periodic exercise. Tamper-evident lineage records satisfy Article 18 record-keeping and the auto-generated-logs requirement, and they discharge ISO/IEC 42001 clauses on documented information and operational planning. The change-control discipline imposed by Article 17's quality management system is met because every training step is admitted by a governance evaluation that is itself logged. SR 21-13 ongoing monitoring is supported because the lineage establishes the baseline against which production drift is measured. The OCC heightened-standards expectations in 12 CFR Part 30 are addressed at the governance layer where they originate. The NAIC 2023 AI Bulletin's expectations of insurer accountability for AI-driven decisions are met because every decision can be traced to the training inputs that shaped it. The EBA discussion paper's emphasis on interpretability and traceability for ML-based IRB models maps onto the entropy-depth profile, which provides a structural rather than narrative account of how the model came to know what it knows. The NIST AI RMF Govern, Map, Measure, Manage functions are all supported by the same underlying lineage substrate.

Adoption pathway

The adoption pathway is designed to compose with the model risk management programs that regulated firms already operate, not to replace them. Existing inventories, validation calendars, and challenge protocols remain in force; the governance layer changes the substrate they operate against from narrative to structural. A regulated institution adopting AQ training governance wraps its existing training pipeline with the governance layer, requiring no change to the model architecture or the optimizer. Each training data source is annotated with rights status, jurisdiction, risk tier, and validation status; the layer enforces these annotations at gradient time. The first training run produces a complete lineage record that can be ingested directly into the institution's model risk management inventory and surfaced to internal validation and second-line review. Subsequent fine-tuning, retraining, and incremental learning runs append to the lineage rather than replacing it, preserving the audit horizon that EU AI Act Article 18 and internal retention policies require. Validation teams gain the entropy-depth profile as a new diagnostic and integrate it into their effective-challenge protocols. Ongoing monitoring under SR 21-13 binds to the lineage baseline, so production drift is measured against a structurally evidenced reference rather than a remembered one. A pharmaceutical sponsor pursuing FDA submissions for an AI-enabled device gains the data-lineage documentation the agency has signaled it will require; a bank under SR 11-7 gains the documentation, validation evidence, and ongoing-monitoring infrastructure regulators expect, backed by structural provenance rather than self-reported narrative; an insurer under the NAIC Bulletin gains a defensible account of how each model came to behave the way it does. The governance layer becomes part of the institution's standing infrastructure, and compliance becomes a property of the training system rather than a project that follows it. As supervisory expectations continue to converge across jurisdictions and across sectors, the institutions that have already moved their training discipline into structural form will find themselves answering questions that competitors are still scrambling to document, and will absorb new obligations as configuration changes rather than as program crises.