Healthcare AI Admissibility Before Clinical Output

Nick Clark

Healthcare AI Admissibility Before Clinical Output

by Nick Clark | Published March 27, 2026 | PDF

A radiology AI reports a finding inconsistent with the patient's clinical history. A drug-interaction checker recommends a contraindicated medication. A clinical decision support module suggests a regimen outside the patient's formulary or institutional protocol. In each case, the AI produced a clinically inadmissible output and that output reached a clinician. Under FDA 21 CFR 820 design controls, 21 CFR Part 11 attribution, the FDA AI/ML SaMD Action Plan with its Predetermined Change Control Plan framework, IEC 62304 software lifecycle requirements, ISO 14971 risk management, EU MDR clinical-performance obligations, ANSI/AAMI/UL 2900-1 cybersecurity expectations, the NIST AI Risk Management Framework as applied to healthcare, and the HHS AI Strategic Plan, the inadmissible output is the regulated event. Inference control prevents the event by evaluating clinical admissibility at the semantic transition level, before the output exists, so that every clinical recommendation is consistent by construction with patient context, clinical guidelines, and institutional policy.

Regulatory framework

Clinical AI sits at the intersection of several regulatory regimes that all attach liability to the output the clinician sees. FDA 21 CFR 820 (the Quality System Regulation, harmonized to ISO 13485 under the QMSR) requires design verification and validation that the device meets defined user needs and intended uses. 21 CFR Part 11 requires that electronic records used in regulated decisions be attributable, contemporaneous, original, accurate, and durable. The FDA AI/ML SaMD Action Plan and the Predetermined Change Control Plan (PCCP) framework require that any post-market modifications to a learning system be bounded by SaMD Pre-Specifications and disciplined by an Algorithm Change Protocol. IEC 62304 partitions medical device software into safety classes A, B, and C, with C requiring formal architectural design, detailed unit verification, and integration testing for software whose failure may cause death or serious injury. ISO 14971 requires hazard analysis, risk estimation, and risk control, with the residual risk acceptability argument grounded in the effectiveness of those controls.

EU MDR adds clinical evaluation, post-market clinical follow-up, and Notified Body oversight. ANSI/AAMI/UL 2900-1 defines cybersecurity baseline expectations for connected medical devices, mapped against the FDA premarket cybersecurity guidance and the AAMI TIR57 hazard analysis approach. The NIST AI RMF, in its healthcare profile, structures governance around Govern, Map, Measure, and Manage functions and explicitly identifies context-dependent admissibility as a measurement obligation. The HHS AI Strategic Plan calls for governance of clinical AI that is transparent, traceable, and consistent with the duty of care.

Across all of these, the regulated artifact is not the model. It is the output that influences a clinical decision. The duty is to constrain that output, demonstrate the constraint, and surveil it post-market.

Architectural requirement

The regulatory frame implies a specific architectural property: clinical admissibility must be a structural property of the output rather than a probabilistic property of the model that produced it. ISO 14971 risk control is more defensible when the control is a deterministic gate than when it is a statistical claim about a model's accuracy distribution. PCCP boundaries are stable when they describe a gate that survives model retraining and brittle when they describe a model whose behavior shifts with each update. Part 11 attribution is straightforward when each output carries the chain of admissibility checks that produced it and difficult when admissibility is litigated post hoc. NIST AI RMF measurement is operationalizable when admissibility events are first-class telemetry and aspirational when they are not.

This architectural property cannot be added on top of a generative model. It must be in the inference path. The admissibility evaluation must occur at the semantic transition level, within the loop that produces the output, with the patient's persistent state as a first-class input to the loop.

The admissibility problem in clinical AI

Clinical AI systems routinely produce outputs that are technically correct but clinically inadmissible. A diagnostic model identifies a finding accurately but does not account for the patient's contraindications, prior treatments, or institutional protocol. A documentation assistant generates a coherent summary that imports an assertion the chart does not support. A sepsis predictor fires on a population baseline that the patient's chronic condition shifts. The output is medically defensible in isolation. In context it is inadmissible — and it is the contextual reading, not the isolated reading, that the clinician acts on and that liability attaches to.

Current practice addresses this through post-generation review: the AI produces an output and the clinician is expected to evaluate its admissibility in context. The premise holds when clinicians have time and cognitive headroom to scrutinize every output. In emergency departments running double-digit hourly throughput, in radiology reading rooms with thousand-study days, in primary-care panels with fifteen-minute slots, and in inpatient pharmacy queues with hundreds of order verifications per shift, the premise breaks. The cognitive economics force clinicians toward acceptance, and the inadmissible output reaches the patient.

Why procedural compliance fails

The procedural answer to admissibility is documentary and operational. Document the intended use. Validate against a clinical-performance benchmark. Train clinicians to scrutinize outputs. Add a safety filter that catches obviously dangerous recommendations. Surveil adverse events post-market. Each of these is necessary; none is sufficient.

Safety filters fail because admissibility is contextual, not categorical. A recommendation safe for one patient is inadmissible for another whose history, allergies, formulary, or care plan differs. A filter operating on the output in isolation cannot evaluate the output against context it never receives. Clinician scrutiny fails because the volume-to-attention ratio in clinical settings does not support per-output critical evaluation. Validation fails because the validation set cannot anticipate every contextual configuration the deployed system will encounter; clinical-performance benchmarks demonstrate average-case adequacy but not contextual admissibility on the long tail. Post-market surveillance fails because by the time the inadmissible output is identified as such, it has already influenced a clinical decision.

The deeper failure is that procedural compliance describes intended behavior at a point in time while the system continues to produce outputs in real time. The PCCP describes what may change; it does not constrain any individual output. ISO 14971 risk-control measures presume an effective control; they do not implement one. The procedural artifacts are necessary preconditions for clearance, but they cannot, by themselves, prevent the inadmissible output from existing.

What the AQ primitive provides

Inference control evaluates every candidate clinical inference against the patient's persistent state before the inference is committed. The patient state object is a first-class governed entity carrying current medications, known allergies, contraindications, treatment history, active care plan, formulary, prior authorizations, institutional protocol, and the trust slope binding accumulated context across encounters. Every transition the model proposes — a finding, a recommendation, a code, an order, a summary assertion — is evaluated against this state at the semantic level.

A drug-recommendation transition is evaluated against the medication list for interaction, the allergy record for contraindication, the formulary for coverage, the active diagnosis list for indication, and the institutional protocol for sequencing. If any admissibility check fails, the transition does not commit. The inference engine explores an alternative transition that satisfies all constraints, or it surfaces an explicit deferral with the failed-check provenance attached. The inadmissible recommendation is never produced. The output that reaches the clinician has already passed every admissibility gate at every step of its generation, and the chain of checks is the audit record.

The rights-governance layer ensures the AI operates within its authorized clinical scope. A diagnostic agent authorized for imaging interpretation cannot generate treatment recommendations, not because a filter blocks the output but because treatment-recommendation transitions are outside the agent's authorized inference scope. The scope is itself a governed object, versioned, attributable, and modifiable only through the change-control surface that the institution's quality system describes. This converts the FDA 510(k) or De Novo intended-use statement from a description into an enforced boundary.

Inference control is not a post-generation filter. The admissibility evaluation is in the inference loop. This is the architectural shift that makes ISO 14971 risk control a deterministic claim, that gives the PCCP a stable surface to constrain, that makes Part 11 attribution end-to-end, and that converts NIST AI RMF measurement obligations into operational telemetry.

Compliance mapping

Inference control lands directly on multiple regulatory cells. For FDA 21 CFR 820 design controls, the admissibility gate is a verified design output whose verification record is the deterministic check chain. For 21 CFR Part 11, every output carries an attributable, contemporaneous, durable provenance — the gates that approved each transition — that satisfies attribution and audit-trail expectations without separate logging infrastructure. For the FDA AI/ML SaMD Action Plan and PCCP framework, the gate is the regulated boundary; SaMD Pre-Specifications constrain what may flow through the gate, and the Algorithm Change Protocol disciplines modifications to the gate logic, not the underlying model. The model can be retrained on schedule without re-clearance because the regulated surface is invariant.

For IEC 62304, the gate logic is amenable to Class C unit and integration verification because it is deterministic; the model can remain Class B with the gate carrying the higher-class assurance. For ISO 14971, hazard-analysis risk-control measures map onto specific gate predicates, and residual-risk arguments are grounded in the verified effectiveness of those predicates. For EU MDR clinical evaluation, the clinical-performance argument is supported by the gate's structural prevention of inadmissible outputs in addition to the model's measured accuracy. For ANSI/AAMI/UL 2900-1, the gate is part of the cybersecurity boundary because it constrains what an exploited model can emit. For the NIST AI RMF Measure function, the gate produces first-class admissibility telemetry — pass rates, deferral rates, gate-version distributions — that the Manage function can act on. For the HHS AI Strategic Plan, the architecture realizes the transparency and traceability obligations as system properties rather than reporting commitments.

Adoption pathway

Adoption begins where the regulatory exposure is highest and the integration cost is lowest. The first deployment is typically a wrapper around an existing CDS Hooks endpoint or drug-interaction checker, where the admissibility gate is inserted between the model and the EHR write surface. The wrapper requires no model retraining; it gives the institution an immediate auditable boundary that the FDA submission, the ISO 14971 file, and the Part 11 validation package can all describe.

The second deployment extends gating to imaging interpretation and ambient-scribe assertion checking, where the admissibility constraints are richer and the gate predicates are co-developed with the institution's quality and risk-management functions. PCCPs for these models are filed or amended to describe the gate as the regulated surface and to scope the SaMD Pre-Specifications around it. EU MDR clinical evaluations reference the gate as a structural risk control. The institution begins to consolidate audit and surveillance pipelines on the gate's telemetry rather than on per-model logging.

The third deployment generalizes the gate across the full clinical AI portfolio, including documentation, coding, and predictive surveillance models. At this stage the institution's NIST AI RMF program becomes operationally grounded: Govern policies describe the gate, Map activities inventory transitions, Measure activities consume gate telemetry, and Manage activities act on it. Vendor procurement standards begin to require gate-compatibility, and the institution's clinical-AI governance committee shifts from per-tool review to portfolio-level surveillance of admissibility events.

For health systems, the practical effect is that clinicians review outputs that are admissible rather than filtering for admissibility themselves. For health-technology vendors, inference control provides the clinical governance architecture that FDA clearance, MDR conformity assessment, and institutional procurement increasingly demand: demonstrable, structural evidence that the system cannot produce outputs inconsistent with the patient's clinical context.