Training Governance for Medical AI

Nick Clark

Training Governance for Medical AI

by Nick Clark | Published March 27, 2026 | PDF

Medical AI is a regulated medical device whose principal manufacturing process is training. The FDA AI/ML SaMD Action Plan, the Predetermined Change Control Plan framework, 21 CFR Part 11, IEC 62304, ISO 13485, ISO 14971, EU MDR Annex VIII read with Article 27, the EU AI Act Article 10, the NIST AI Risk Management Framework as applied to healthcare, and ISO/IEC 42001 each impose obligations on what the model learned, from which evidence, with what verification, and how change is controlled across the model lifecycle. Current training pipelines treat all training data uniformly and produce a model whose internal knowledge cannot be decomposed into auditable components. Training governance provides depth-selective gradient routing, entropy-based training profiles, and provenance-traceable training dynamics that align medical AI training with the regulatory expectations that govern its deployment.

Regulatory framework

The FDA AI/ML SaMD Action Plan establishes the agency's regulatory orientation toward continuously learning medical AI: a Total Product Lifecycle approach in which the manufacturer commits, prospectively, to a change-control regime governing how the model may evolve after authorization. The Predetermined Change Control Plan, formalized in subsequent FDA guidance, requires the manufacturer to specify in advance the modifications that may be made to the model, the protocol under which those modifications will be implemented and verified, and the impact assessment that demonstrates the modifications remain within the authorized risk profile. A PCCP cannot be honored by a training pipeline that cannot describe, at the gradient level, what a given training update changed and why.

21 CFR Part 11 governs electronic records and electronic signatures for FDA-regulated systems and imposes integrity, attribution, and audit-trail expectations on the records that document a regulated process. Training is such a process, and the records of training, the data used, the parameters updated, the validation performed, fall within scope. IEC 62304 is the international standard for medical device software lifecycle processes, requiring documented planning, risk-classified activities, and traceability from requirements through implementation to verification. ISO 13485 specifies the quality management system requirements for medical device manufacturers, and a training pipeline is part of the manufacturing process under that QMS. ISO 14971 governs the application of risk management to medical devices, and the risks attributable to model behavior are risks attributable to training.

EU MDR Annex VIII, read with Article 27 on the Unique Device Identification system, anchors the post-market identification and traceability obligations that EU-marketed medical AI must satisfy. Article 10 of the EU AI Act, addressing data and data governance for high-risk AI systems, requires that training, validation, and testing data sets meet specified quality criteria, including relevance, representativeness, freedom from errors to the greatest extent possible, and statistical properties appropriate to the intended purpose. The provision is not satisfied by curation alone; it requires that the manufacturer can demonstrate, on examination, that the data governance was actually applied. The NIST AI Risk Management Framework provides the U.S. voluntary structure within which healthcare AI risk is increasingly being assessed by payers, health systems, and federal procurement, and ISO/IEC 42001 specifies the AI management system standard that healthcare organizations are beginning to adopt as the systemic counterpart to ISO 13485.

Architectural requirement

The convergent architectural requirement across these regimes is that the training process must be a governed, decomposable, evidentially-traceable manufacturing operation rather than an opaque statistical aggregation. The regulator must be able to ask, of any specific clinical behavior the model exhibits, which training examples most influenced that behavior, what the evidence grade of those examples was, what the lawful basis for processing them was, and what the change-control posture is for modifying that behavior in the future. The PCCP requires that the manufacturer can answer these questions prospectively about modifications not yet performed; the AI Act Article 10 requires that the manufacturer can answer them retrospectively for the authorized model.

Such a system must operate at the gradient level, because the gradient is the unit at which a training example actually changes a model parameter. It must carry source, evidence-grade, population, and consent metadata into the training loop and use that metadata to govern the depth and magnitude of the resulting parameter change. It must detect memorization, because memorization of patient-specific information is a privacy and safety risk that data de-identification only partly addresses. And it must produce an immutable, Part 11-compliant record sufficient to support FDA premarket review, EU MDR conformity assessment, and ongoing AI Act and ISO/IEC 42001 audit cadences.

Why procedural compliance fails

The dominant compliance response to medical AI training obligations relies on a stack of procedural controls layered around an ungoverned training core: a curated dataset, a documented data card, a model card, a held-out validation set, a clinical evaluation plan, and a change-control SOP. Each of these controls describes an aspect of the manufacturing process at the level of inputs, outputs, and policy. None of them describes, or governs, the manufacturing process itself, which is the gradient-level update of model parameters in response to specific training examples. The training pipeline at the core of this stack treats every training example uniformly: a finding from a large randomized controlled trial and a finding from a single case report receive training signal proportional to their representation in the corpus, not proportional to their evidence grade. The model that emerges cannot distinguish between well-established clinical knowledge and preliminary findings because the training process did not encode the distinction.

For PCCP, this is fatal. A change-control plan that commits to modifying the model in defined ways cannot be honored by a pipeline that cannot describe what a modification changed at the parameter level. The manufacturer can show the data that went in and the validation metrics that came out, but cannot show the relationship between the modification and the model behaviors it was supposed to alter. Under FDA examination, that gap is the gap between a controlled manufacturing process and an uncontrolled one.

For 21 CFR Part 11 and IEC 62304 traceability, procedural compliance produces a documentation surface but not a provenance substrate. The data card identifies the training corpus, but the model behaviors cannot be traced back to their training origins. Under audit, the manufacturer can describe the inputs and the outputs but not the causal path between them. For ISO 14971 risk management, the same problem appears: a clinical risk identified post-deployment cannot be remediated at training time without a traceable connection between the risk and the training examples that produced it. For AI Act Article 10, curation can be documented but the actual application of data-governance decisions to gradient-level outcomes cannot be evidenced. Procedural compliance, in each of these regimes, produces a paper record about a process whose internal mechanics remain opaque, and that opacity is precisely what the regulatory expectations are designed to eliminate.

What the AQ primitive provides

Training governance, in the Adaptive Query architecture, inserts depth-selective gradient routing, entropy-based training profiles, and provenance tracing directly into the training loop. Each training example carries metadata established at curation time: evidence grade on a defined scale, source provenance with regulatory-compliant identifiers, patient population characteristics, and lawful-basis annotations. The gradient routing mechanism reads this metadata at each training step and controls the depth and magnitude of the resulting parameter update. Findings from large-scale randomized controlled trials route gradients to deeper model layers, establishing foundational clinical knowledge. Findings from observational studies route to intermediate layers with moderated gradient magnitude. Case reports and preliminary findings route to surface layers with minimal depth, informing pattern recognition without establishing deep clinical commitments. Evidence grade is encoded into the model's parameter geometry, not asserted in a data card.

Entropy-based training profiles continuously assess whether the model is learning generalizable patterns or memorizing specific examples. Memorization onset triggers an automated intervention that prevents the model from encoding patient-specific details that should remain private under HIPAA and GDPR, addressing a privacy risk that de-identification alone does not fully eliminate because subtle re-identification attacks operate on the memorization surface that de-identification does not see. Provenance tracing records the relationship between training examples, gradient updates, and resulting parameter changes, producing a queryable trace from any model behavior back to the training examples most responsible for it, with their evidence grade and gradient depth attached.

The combined record is the substrate on which a Predetermined Change Control Plan can actually be executed. A change to the model is described not as a retraining run but as a defined modification to specified depth strata under specified evidence-grade constraints, with the change record produced by the routing mechanism itself. The same substrate satisfies 21 CFR Part 11 attribution and audit-trail requirements at the level of the manufacturing process rather than the documentation surface, and supplies the IEC 62304 traceability spine from clinical requirement through training example through gradient update through validated behavior. ISO 14971 risk management becomes operative on the training side as well as the deployment side: a clinical risk identified after deployment can be remediated at training time by re-routing the gradients of the training examples that produced the risk, without retraining the model from scratch and without disturbing the deeply established clinical knowledge that should remain stable.

Compliance mapping

Against the FDA AI/ML SaMD Action Plan and the PCCP framework, training governance supplies the gradient-level change description that makes a Predetermined Change Control Plan executable rather than aspirational. Against 21 CFR Part 11, the immutable provenance record provides the attribution and audit trail expected of regulated electronic records at the manufacturing layer. Against IEC 62304, the routing-and-provenance substrate furnishes the traceability spine that the standard requires from requirements to verification. Against ISO 13485, the training pipeline is brought inside the QMS as a controlled manufacturing process rather than an opaque step. Against ISO 14971, training-side risk remediation becomes available as a complement to deployment-side controls. Against EU MDR Annex VIII and Article 27, the per-update provenance supports the post-market identification and traceability obligations that follow the device through its lifecycle. Against EU AI Act Article 10, the data-governance posture is realized as enforced gradient routing rather than asserted curation, with evidence on demand. Against the NIST AI RMF and ISO/IEC 42001, the governed training substrate provides the management-system evidence that the framework and the standard expect from a mature healthcare AI program.

Adoption pathway

A medical AI development organization adopting training governance begins by annotating its existing training corpus with evidence-grade, provenance, population, and lawful-basis metadata, drawing on the curation work already required by current procedural controls. The training pipeline is then instrumented with the gradient-routing, entropy-profile, and provenance-tracing primitives, replacing the uniform-update training core with a governed one. Initial deployment targets a single device family approaching premarket submission, where the regulatory benefit is most concrete and the documentation burden of procedural-only compliance is most acute. The provenance substrate is then extended across the manufacturer's portfolio, becoming the QMS-integrated training infrastructure that ISO 13485, IEC 62304, and ISO/IEC 42001, taken together, expect a modern medical AI manufacturer to operate.