LLM and Skill Gating for Medical Licensing

Nick Clark

LLM and Skill Gating for Medical Licensing

by Nick Clark | Published March 27, 2026 | PDF

Medical licensing exists because patient safety requires that practitioners demonstrate competence before practicing. AI medical systems bypass this principle entirely: they are deployed based on aggregate performance metrics without individual capability assessment, and they continue operating without competence monitoring. LLM and skill gating applies the licensing principle to medical AI through curriculum-based progressive capability unlocking where each clinical capability is earned through demonstrated evidence, regression detection that catches declining competence, and certification tokens that structurally authorize the specific clinical tasks the system has proven it can perform safely. This article maps the AQ skill-gating primitive disclosed under provisional 64/049,409 onto the regulatory framework administered by the Federation of State Medical Boards (FSMB), the National Board of Medical Examiners (NBME), the United States Medical Licensing Examination (USMLE) program, the American Board of Family Medicine (ABFM) and analogous specialty boards, the FDA Software-as-a-Medical-Device pathway, HIPAA, and the EU AI Act Annex III high-risk regime that explicitly enumerates medical decision support.

1. Regulatory and Compliance Framework

Human medical licensing in the United States is administered through a layered structure. The FSMB coordinates state medical boards, each of which issues and enforces individual licenses to practice medicine. The USMLE — jointly sponsored by FSMB and NBME — is the three-step examination sequence (Step 1 basic science, Step 2 CK clinical knowledge and Step 2 CS clinical skills as historically administered, Step 3 independent practice) that gates eligibility for state licensure. Specialty competence is governed separately by member boards of the American Board of Medical Specialties (ABMS): ABFM for family medicine, ABIM for internal medicine, ABS for surgery, and so on, each with initial certification and Maintenance of Certification (MOC) requirements. Continuing Medical Education (CME) credits documented to state boards are required for license renewal. The structural insight is that human medical authority is not granted globally; it is decomposed into examined and re-examined sub-capabilities, each individually credentialed and individually revocable.

AI clinical systems have, until recently, sat almost entirely outside this framework. The FDA regulates clinical decision support and Software-as-a-Medical-Device (SaMD) under 21 CFR 820 quality system regulation, the De Novo and 510(k) pathways, and the FDA Pre-Cert / Predetermined Change Control Plan guidance, but FDA clearance certifies a device, not a continuing competence relationship analogous to license renewal. HIPAA's Privacy and Security Rules govern PHI handling but do not speak to competence at all. The EU AI Act enumerates medical AI as Annex III high-risk, requiring risk management, data governance, technical documentation, logging, transparency, human oversight, accuracy, robustness, and cybersecurity — but the Act presumes a static system whose properties were demonstrated at conformity assessment, not a clinical agent whose authority is structurally bounded and continuously re-validated. Joint Commission accreditation, CMS Conditions of Participation, and state hospital licensure all impose duties on the institution deploying AI but provide no framework analogous to the individual practitioner license that the AI is functionally replacing.

The compliance gap is therefore not a missing rule. It is a missing architecture. The rules for human practitioners assume an entity that holds individually-named, individually-evidenced, individually-revocable capabilities, and the rules for AI assume an entity whose properties are static and globally certified. Skill gating is the architectural primitive that lets an AI clinical system present the surface that the human-licensure regulatory framework already knows how to govern.

2. Architectural Requirement

The architectural property required is structural decomposition of clinical authority into named capabilities, each independently gated by evidence, each independently monitored, and each independently revocable, with cryptographic certification tokens that an external regulator or institution can verify. The medical AI system must be incapable — not merely unwilling — of performing a clinical action for which it does not hold a valid, unrevoked capability token. Authorization must be enforced at the architectural floor of the system, not at the policy or prompt layer.

This requirement decomposes into five architectural commitments. First, capability namespace: every clinical action the system can take is named within a published taxonomy aligned to existing clinical ontologies (SNOMED CT, ICD-10-CM, RxNorm, LOINC, CPT) and to USMLE/ABMS task categorizations. Second, evidence binding: each capability token is bound to a portfolio of evidence — case-level accuracy, edge-case handling, calibration, abstention behavior, demographic-subgroup performance — that the gate evaluator signed. Third, structural enforcement: the inference and actuation pipeline rejects any output that would constitute exercise of an ungated capability, regardless of how the request was framed. Fourth, continuous re-evaluation: capability tokens carry expirations and are re-issued only when current performance evidence is presented, mirroring license renewal and MOC. Fifth, graduated revocation: declining performance triggers a defined sequence — supervised mode, restricted mode, suspension, revocation — rather than a binary kill switch.

These commitments are architectural, not procedural. A system that documents capabilities in a markdown file but enforces nothing is not skill-gated. A system whose enforcement layer can be bypassed by prompt engineering is not skill-gated. The structural test is whether the system, presented with a clinical task outside its credentialed envelope, can execute it; if it can, the gate is decorative.

3. Why Procedural Compliance Fails

The dominant industry response to medical AI governance is procedural: model cards, intended-use statements, validation studies submitted to FDA, post-market surveillance plans, hospital AI governance committees, and deployment policies that instruct clinicians how to use the system. Each of these is valuable, and none of them solves the structural problem. A model card describes what the system was tested on; it does not constrain what the system will do. An intended-use statement is enforceable against the manufacturer in product liability, not against the running model in a 2 a.m. emergency department session. FDA clearance of a device does not bind the deployed instance to operate within the cleared envelope when the prompt or input drifts.

Procedural compliance also fails at the temporal axis that medical licensure most carefully addresses. License renewal, CME, and MOC exist because clinical knowledge changes, practitioner skill drifts, and standards of care evolve. AI models exhibit the same drift through covariate shift, label shift, retraining-induced regression, and dependency-graph instability in retrieval-augmented systems. A procedural framework that approves a device once and surveils it through voluntary adverse-event reporting reproduces exactly the failure mode that human medicine spent a century engineering MOC to prevent.

Finally, procedural compliance cannot answer the regulator's most basic question at incident time: at the moment of harm, was this specific system authorized to perform this specific action, and what evidence supported that authorization? A SOC-style attestation, a validation report, and a hospital policy can each be produced; none of them is a runtime authorization token bound to the specific capability exercised. Without that token, the post-incident reconstruction is narrative, not forensic.

4. What the AQ Skill-Gating Primitive Provides

The Adaptive Query skill-gating primitive, disclosed under USPTO provisional 64/049,409, specifies a curriculum-structured capability lattice in which every clinical action a system can take is a named node, each node is gated by an evidence portfolio signed by a credentialed evaluator, and each node carries a certification token that the runtime verifies before actuation. The primitive is technology-neutral: any signature scheme, any evaluation methodology, any model architecture. What it fixes is the structural relationship between evidence, authorization, and execution.

Curriculum order is load-bearing. Diagnostic suggestion for a condition cannot be unlocked before the foundational capabilities it depends on — relevant differential reasoning, calibrated uncertainty, abstention on out-of-distribution presentations — have themselves been unlocked. Treatment recommendation for a condition cannot be unlocked before diagnostic competence on that condition plus pharmacology, contraindication, and patient-factor sub-capabilities. The lattice mirrors the human curriculum structure that USMLE Steps and ABMS specialty examinations encode, which is what makes regulatory mapping tractable.

Evidence portfolios are multi-dimensional. For each capability gate, the portfolio includes: case-level accuracy on a held-out clinical evaluation set; edge-case behavior on adversarial and rare-presentation cases; calibration of stated confidence against observed accuracy; abstention rate on cases that exceed the system's competence; demographic-subgroup performance to detect disparate failure; and behavioral evidence under distribution shift. The gate is passed only when the portfolio meets the published threshold, and the certification token records the portfolio hash so that the evidence basis is forensically reconstructible.

Continuous monitoring runs against the same dimensions used for initial gating. Performance on each unlocked capability is sampled in deployment, with appropriate de-identification and HIPAA-compliant logging. When the monitored signal crosses a graduated threshold, the system enters supervised mode (outputs reviewed before clinical action), then restricted mode (capability narrowed to higher-confidence cases), then suspension (capability withdrawn pending re-evaluation), then revocation (capability removed and re-gating required). Every transition is signed and recorded, producing a tamper-evident competence history that mirrors a state board's disciplinary record for a human practitioner.

Recursive composition lets the primitive scale to specialty- and sub-specialty-level governance. A general-medicine capability lattice composes with a specialty lattice (cardiology, psychiatry, family medicine) which composes with sub-specialty lattices (electrophysiology, addiction medicine), each with its own gates, evidence thresholds, and renewal cadence. The same primitive operates at every level, which is why a deployment scales by adding lattices rather than by re-architecting.

5. Compliance Mapping

Each element of the AQ skill-gating primitive maps onto an existing element of the medical regulatory framework. Capability namespace maps to the USMLE content outlines and ABMS specialty content domains, which already enumerate the clinical tasks that examined practitioners must demonstrate. Evidence portfolios map to the USMLE Step examinations, ABMS initial certification examinations, and the case-log and procedure-log requirements that residencies and specialty boards already collect. Certification tokens map to state medical licenses and ABMS board certificates, which are themselves cryptographically verifiable today through FSMB's Federation Credentials Verification Service and ABMS Certification Matters.

Continuous monitoring and graduated revocation map to MOC, CME-based license renewal, and state-board disciplinary processes. The graduated sequence — supervised, restricted, suspended, revoked — is structurally identical to the disciplinary ladder state boards already operate (letter of concern, practice restriction, suspension, revocation), which is what makes the framework legible to the bodies that would need to govern AI clinical systems under analogous authority.

FDA SaMD compliance is improved rather than displaced. The Predetermined Change Control Plan guidance contemplates manufacturers describing in advance how a model may evolve post-clearance; skill gating provides the structural mechanism by which evolution is bounded and evidenced. Each capability gate is a documented change increment with bound evidence, which fits the PCCP envelope without requiring re-clearance for every increment. Real-world performance monitoring feeds the same continuous-monitoring channel skill gating already requires.

EU AI Act Annex III high-risk obligations map directly. Article 9 risk management is implemented by the curriculum dependency structure that prevents unlocking risky capabilities before prerequisite capabilities are evidenced. Article 10 data governance is implemented at the evaluation-set level for each gate. Article 12 logging is satisfied by the lineage record of gate decisions, monitored signals, and revocation events. Article 14 human oversight is implemented by the supervised-mode tier and by the certifying-evaluator role at each gate. Article 15 accuracy, robustness, and cybersecurity are evidenced per-capability rather than globally, which is what high-risk medical use actually requires.

HIPAA compliance is preserved because the monitored signals and lineage records can be maintained as de-identified evaluation telemetry under the Safe Harbor or Expert Determination methods, and because the certification tokens themselves contain no PHI. The institutional Business Associate Agreement governs the flow of any identifiable evaluation data into the gate evaluator, which is a tractable contracting problem rather than a structural one.

6. Adoption Pathway

Adoption begins at the institutional layer, not at the regulator. A health system deploying medical AI defines its initial capability namespace by intersecting the vendor's claimed capabilities with the institution's clinical use cases and with the relevant ABMS content domains. The institution stands up a gate-evaluator function — typically a clinical informatics committee with specialty representation — that authors evidence-portfolio thresholds for each capability and signs the resulting certification tokens. The deployed system is configured to refuse any action outside its currently-credentialed envelope.

Vendor adoption follows. Vendors who instrument their inference pipelines to verify capability tokens, expose evaluation hooks for institutional gate evaluators, and emit lineage records for monitored signals become deployable in skill-gated institutions without bespoke integration. The vendor differentiation moves from "our model scored X on benchmark Y" to "our model exposes the capability lattice your committee can govern," which is a more durable commercial position because it survives model updates.

Regulator adoption is the slowest but the most consequential. FDA can incorporate skill-gating evidence structure into PCCP submissions and into 510(k) and De Novo review without statutory change, because the agency already accepts evidence portfolios; what changes is the per-capability granularity. State medical boards can begin treating institutional skill-gating governance as analogous to delegated practice, where the AI's authority is bounded by the institution's credentialing process much as a physician assistant's authority is bounded by a supervising physician's delegation. FSMB and ABMS can publish capability ontologies aligned to USMLE and specialty content outlines, providing the shared namespace that vendors and institutions need.

Eventually the regulatory destination is a structural one: medical AI systems carry verifiable capability credentials that map onto the same registries that already verify human practitioner credentials, and the public-facing FSMB and ABMS lookup services answer the same kind of query for an AI clinical agent that they already answer for a physician. That destination does not require new statutes. It requires the architectural primitive that makes AI clinical authority decomposable, evidenced, monitored, and revocable. Skill gating is that primitive, and provisional 64/049,409 is its disclosed structural form.