LLM and Skill Gating for Manufacturing Quality Systems

Nick Clark

LLM and Skill Gating for Manufacturing Quality Systems

by Nick Clark | Published March 27, 2026 | PDF

Manufacturing quality control determines whether products meet specifications before reaching customers. Human quality inspectors earn certification through training and demonstrated competence on specific product types and defect categories. AI quality systems are deployed based on aggregate detection metrics without product-specific competence validation or continuing performance monitoring. Skill gating applies the quality certification framework to manufacturing AI, requiring demonstrated competence on each product type and defect category before the system earns inspection authority, with regression detection that catches declining detection accuracy before defective products reach customers.

1. Regulatory Framework

Manufacturing quality is one of the most heavily codified domains in industrial regulation, and the codification has a single recurring shape: competence must be demonstrated, documented, and bounded before authority to make quality-affecting decisions is granted. ISO 9001:2015 clause 7.2 obligates organizations to determine the competence required for personnel performing work that affects quality performance, ensure that competence on the basis of education, training, or experience, and retain documented information as evidence of competence. Clause 7.1.5 imposes parallel requirements on monitoring and measurement resources, requiring that they be verified or calibrated against measurement standards traceable to international or national standards before use. The standard is explicit that competence is a per-task property, not a global attribute of a worker or instrument.

Sector-specific regimes layer additional structure on the same axis. ISO 13485 for medical devices and the FDA's 21 CFR Part 820 Quality System Regulation require that personnel performing inspection, testing, or release activities be qualified for the specific products and procedures they handle, with qualification records maintained and reviewed during audit. IATF 16949 for automotive adds layered process audits, control plans tied to specific part numbers, and product-specific PPAP qualification gates. AS9100 for aerospace adds first-article inspection requirements that gate production release on demonstrated process capability against the specific drawing revision and material lot. The Nadcap accreditation system for special processes, GMP under 21 CFR Part 211 for pharmaceutical manufacturing, and the FSMA preventive controls framework for food all rest on the same foundation: competence is product-specific, evidence-bound, and revocable when conditions change.

AI inspection systems are entering this regulated space without an architectural mapping to the framework. Notified bodies, FDA inspectors, and customer-quality auditors are now asking the operational question that the standards have always asked of human inspectors and physical instruments: what is this system qualified to do, what evidence supported the qualification, and how is the qualification maintained when the product, process, or environment changes. The standards do not exempt automated systems; they apply with full force, and the absence of a structural answer is increasingly being treated as a finding.

2. Architectural Requirement

The architectural requirement that follows from the regulatory framework is a structural binding between a quality-affecting decision and an evidence-supported, scope-limited authorization to make that decision. The binding must be checkable at the moment of decision, not reconstructed after the fact from training records and validation studies stored elsewhere. The scope must be expressible at the granularity the standards require: product type, defect category, severity range, measurement modality, and operating envelope. The authorization must be revocable on a per-scope basis when evidence indicates that the demonstrated competence is no longer present, without collapsing the entire system's authority.

The same architectural shape that the standards impose on a human inspection workforce — qualification-by-task, evidence-of-record, scope-bounded authority, suspension on regression — must be expressible inside the inspection system itself. This is not an inspection-UX problem or a logging problem. It is a control-flow problem. Every accept-or-reject decision the system emits must be the output of a code path that has, before emitting, verified that a non-revoked, evidence-backed authorization exists for the specific product-defect-modality scope of the decision being made. Decisions outside scope must be either refused, downgraded to advisory, or escalated to a qualified human, with the disposition itself recorded as a governance event.

This requirement composes hierarchically. A line-level inspection system may have authority for surface defects on product family A but not for dimensional inspection on product family B. A plant-level quality system may have authority to release lots from a process that has demonstrated capability but not from a newly introduced process pending PPAP. A corporate quality system may have authority to sign off on supplier qualification only against suppliers within a credentialed audit scope. The architecture must support all three scopes simultaneously and must keep the scopes structurally distinguishable so a regulator or customer can audit each level independently.

3. Why Procedural Approaches Fail

The prevailing approach to AI quality deployment is procedural: a validation study is conducted, aggregate accuracy metrics are computed, a deployment memo is signed by quality engineering, and the system goes live with monitoring dashboards. This approach fails the architectural requirement on every axis the standards care about, and the failure mode is increasingly visible in audit findings.

Aggregate metrics conceal scope-level incompetence. A vision system reporting 99.4 percent overall accuracy on a validation set may be performing at 99.9 percent on the dominant product variant and 78 percent on a low-volume variant whose defect signatures are underrepresented in training data. The aggregate passes the deployment threshold while the scope-level performance fails on exactly the variant where escapes are most expensive to discover. The standards have always required scope-level qualification precisely because aggregate competence does not imply per-task competence; the same lesson now applies to AI.

Deployment-time validation does not survive process change. A process engineering change — a new supplier of stamped components, a tooling refurbishment, a lighting fixture replacement, an HVAC adjustment that alters condensation behavior — silently shifts the defect distribution the system encounters in production. The validation evidence supporting the original deployment no longer reflects the operating conditions. Procedural approaches address this through periodic revalidation campaigns, but the gap between change and revalidation is exactly the window in which uncontrolled escapes occur, and the campaigns themselves are organizationally heavy enough that they are deferred or skipped under production pressure.

Authority is not bounded. A system deployed for surface inspection on product family A is technically capable of running inference on images of product family B, and operational pressure routinely extends scope informally — a new product line is added to the conveyor, the system is pointed at it, and its outputs are consumed without a fresh qualification pass because the change is treated as configuration rather than as a new authority grant. The standards prohibit this; the procedural approach has no structural mechanism to enforce the prohibition. The system itself does not know what it is qualified for and cannot refuse work outside its scope.

Regression is detected late or not at all. Performance monitoring dashboards are reviewed by humans on a cadence that lags the regression by days or weeks, and the human review must distinguish a real regression from normal variation, supplier mix shifts, or seasonal effects. By the time a regression is escalated, decisions made under the regressed scope have already shipped. Targeted suspension of the affected scope while preserving the rest is rarely implemented because the deployment was never decomposed into scopes to begin with; the practical response is either to ignore the alert or to take the entire system offline, both of which are operationally untenable.

4. The AQ Skill-Gating Primitive

The Adaptive Query skill-gating primitive, disclosed under USPTO provisional 64/049,409, supplies the architectural binding the standards require. The primitive defines a curriculum of skills, where each skill is a structured assertion that the system has demonstrated competence on a specifically scoped task — a product-defect-modality tuple, in the manufacturing quality case — supported by a corpus of evidence (validation samples, accuracy bounds, threshold calibrations, false-positive and false-negative envelopes) and signed by an authority within a published taxonomy. Skills are explicit, named, and separately revocable.

A skill is unlocked through an evidence gate that evaluates a candidate corpus against pre-published acceptance criteria for that skill. The criteria are not an aggregate accuracy threshold; they are a scope-specific statement of what the system must demonstrate to earn the skill — coverage of every relevant defect type at every relevant severity level, false-positive performance against defect-free samples, threshold calibration against borderline cases, and stability under the environmental variation expected in production. The gate is run by the issuing authority, the resulting skill token is signed and bound to the system instance and the scope, and the evidence corpus is recorded as lineage.

At decision time, the inspection system consults the skill set before emitting any quality-affecting output. A decision proposed within the scope of an unlocked skill is admitted; a decision proposed outside any unlocked skill is structurally refused or downgraded to advisory with a recorded escalation. The check is not a logging hook; it is a precondition of the actuation, encoded so that the system cannot emit an autonomous accept-or-reject decision in a scope where it has no skill. This is the structural shape of human inspector qualification, expressed inside the AI control flow.

Regression monitoring is bound to the same scope decomposition. Each skill carries acceptance criteria that are continuously re-evaluated against a sliding window of recent decisions and ground-truth feedback (rework records, customer returns, downstream test results). When the rolling evidence indicates that the criteria are no longer met, the skill is automatically suspended — not the entire system — and decisions in that scope are routed to human inspection until the skill is either re-earned through a fresh evidence pass or formally retired. The suspension event is itself a credentialed observation in the lineage record. The primitive is technology-neutral with respect to model architecture, signature scheme, and storage substrate, and composes hierarchically across line, plant, and enterprise scopes.

5. Compliance Mapping

The skill-gating primitive maps directly onto the documentary requirements of every major manufacturing quality regime. Against ISO 9001 clauses 7.1.5 and 7.2, the skill set is the documented evidence of competence; the issuing authority taxonomy is the organizational structure determining who is qualified to qualify the system; the suspension lineage is the record of competence maintenance. An auditor asking to see competence records for an AI inspection station receives a structured, scope-decomposed, evidence-backed report that mirrors the structure they already use for human inspector records.

Against ISO 13485 and 21 CFR Part 820, the skill scopes align naturally with device master record structure: each product family carries its own skill set, each design change triggers a skill-level qualification refresh rather than a global revalidation, and the lineage record satisfies the design history file requirement for inspection equipment qualification. Against IATF 16949, the skill granularity supports per-part-number and per-control-plan qualification; PPAP submissions can include the relevant skill tokens and their evidence corpora as part of the quality submission package. Against AS9100, first-article inspection findings can be expressed as the evidence corpus for a skill bound to that specific drawing revision, and revision changes structurally invalidate the skill until a new first-article evidence pass is completed.

Against GMP under 21 CFR Part 211 and the FSMA preventive controls framework, the same shape supports computer system validation requirements: the skill is the validated state, the evidence corpus is the validation package, the suspension event is the change-control trigger, and the lineage record satisfies the data integrity expectations of ALCOA-plus. The mapping is not an after-the-fact reporting layer; it is a structural correspondence between the architectural primitive and the regulatory shape, and the correspondence is what makes the primitive defensible during audit rather than merely descriptive.

6. Adoption Pathway

Adoption begins at a single inspection station with a defined product-defect scope and follows the same evidence path the quality organization already uses for instrument qualification and operator certification. The first deployment selects a station where the existing validation package can be re-expressed as a skill evidence corpus, the resulting skill is signed by the same authority that signs current qualification records, and the skill-gating runtime is wired into the inspection system's actuation path. The visible behavior to operators and quality engineers is unchanged for in-scope decisions; the new behavior is the structural refusal of out-of-scope decisions and the automatic suspension on regression.

The second phase extends the skill set to cover the station's full operating scope and adds the regression monitoring loop against rework, return, and downstream-test feedback. This phase is where the operational benefit becomes visible: scope-bounded suspensions replace whole-system shutdowns, targeted re-qualification replaces global revalidation campaigns, and quality engineering review shifts from chasing aggregate-metric anomalies to reviewing skill-level evidence. The corresponding audit posture shift is significant: a finding from a notified body or customer auditor about a specific product-defect scope can be answered with the skill record for that scope rather than a defense of the entire deployment.

The third phase composes across stations and plants. Plant-level quality systems consume skill records from line-level inspection stations as credentialed observations contributing to lot-release decisions; corporate quality systems consume plant-level skill aggregates for supplier-qualification and regulatory-submission purposes. The hierarchical composition is what makes the primitive defensible at the regulatory scale where it matters most: a single product recall investigation, a single warning-letter response, a single supplier-disqualification dispute can be reconstructed from the lineage record without reconstructing the entire deployment history. Honest framing — the primitive does not replace the quality management system; it supplies the architectural substrate that the quality management system has always assumed and that AI deployments have, until now, structurally lacked.