LLM and Skill Gating for Aviation Pilot Training Systems

Nick Clark

LLM and Skill Gating for Aviation Pilot Training Systems

by Nick Clark | Published March 27, 2026 | PDF

Aviation pilot training is one of the most heavily regulated competence progressions in any profession, and it is one of the few in which the regulator has spelled out the evidence required for each privilege at the level of the individual maneuver, tolerance, and condition. FAA Part 61 governs the certification of individual airmen, Part 121 and Part 135 govern the operating certificates under which they fly, Part 142 governs the training centers and aviation training devices that increasingly carry the load of practical instruction, and Part 117 governs the flight time and rest cycles inside which all of that learning has to occur. EASA Part-FCL and ICAO Annex 1 extend the same evidence-gated progression internationally, FAA Advisory Circular 60-28 codifies the English language proficiency standard that mediates the entire system, MIL-STD-1797 governs the flying-qualities envelope that determines simulator credit, and Part 142 ATD authorization rules determine which evidence collected in which device is admissible. Inside this perimeter, AI-assisted training has begun to take on roles that were previously reserved for instructors and examiners, and the question is no longer whether language models can deliver instruction but whether they can be trusted to advance a student from one privilege to the next on the regulator's evidentiary terms. Skill gating provides the curriculum engine that turns AI-assisted instruction into evidence-gated capability progression, where each privilege is earned through demonstrated competence, recorded against a regulator-readable evidence portfolio, and continuously reverified through regression monitoring tied to the same currency rules the FAA, EASA, and ICAO already enforce.

Regulatory Framework

Pilot training is governed by an interlocking stack of national, regional, and international rules whose common feature is that competence is documented at the level of specific demonstrated behaviors. In the United States, FAA Part 61 defines the aeronautical experience, knowledge, and practical test standards required for each pilot certificate and rating, from student through Airline Transport Pilot, with each maneuver enumerated in the Airman Certification Standards (ACS) and each tolerance specified numerically. Part 61.57 defines the recent flight experience required to act as pilot in command, including instrument currency in 61.57(c) and takeoff and landing currency in 61.57(a) and (b). Part 121 and Part 135 layer on the operating-certificate-specific training, checking, and currency requirements for scheduled and on-demand carriers, including recurrent training intervals under 121.433 and 135.299, line-oriented evaluations, line checks, and proficiency checks. Part 142 governs FAA-certified training centers and the use of Aviation Training Devices, Flight Training Devices, and Full Flight Simulators in lieu of aircraft time, with credit allowances tied to device fidelity levels and to the structure of the approved training course. FAA Advisory Circular 60-28 establishes the English language proficiency standard required by ICAO and enforced through Part 61 and Part 65.

Internationally, EASA Part-FCL mirrors and extends the FAA structure with its own competency-based training and assessment philosophy under the Evidence-Based Training framework codified in ICAO Doc 9995, and ICAO Annex 1 establishes the global floor for licensing. MIL-STD-1797 governs the flying qualities standards that underlie military training device fidelity and that bound the conditions under which simulator-collected evidence is admissible for a given maneuver. FAR Part 117 defines the fatigue-related flight and duty time limits for Part 121 flightcrews, which constrain when and how training can be delivered and which implicitly weight the evidentiary value of a maneuver flown at the end of a long duty period. Each of these documents shares an underlying assumption: a pilot's privileges are not a status but a continuously maintained competence state, evidenced through specified maneuvers performed to specified tolerances under specified conditions, witnessed and signed off by a qualified instructor or examiner. The regulator does not care that the student feels ready. The regulator does not care that the AI tutor is confident. The regulator cares that the student has demonstrated, on the record, that they meet the airman certification standards, and that the record is structured the way the surveillance pipeline expects.

Architectural Requirement

The architectural requirement implied by this stack is that any system participating in pilot training, whether human instructor, simulator, or AI tutor, must produce evidence that maps cleanly onto the regulator's privilege ladder and that can be enumerated in the same shape the regulator's surveillance program expects. Each privilege, from first solo through cross-country endorsement, instrument rating, multi-engine rating, type rating, ATP-CTP, initial operating experience, and recurrent qualification, has prerequisites that must be demonstrably satisfied before the privilege is exercised. The training system must therefore know, at any moment, what privileges a given student currently holds, what evidence supports each privilege, where that evidence was collected, in what device or aircraft, with which instructor signature, and what gates remain open between the student's current state and the next privilege.

The system must also know what evidence has decayed. Aviation regulation is unusual in that competence has an explicit shelf life: instrument currency lapses without recent approaches under 61.57(c), takeoff and landing currency lapses without recent operations under 61.57(a)(b), Part 121 recurrent training is required at intervals defined in the operator's approved training program, type-specific currency is enforced in 135.293 and 135.297, and the EASA OPC and LPC cycles run on six-month intervals for commercial operations. A training architecture that records initial competence but not its decay produces an evidence trail that drifts out of sync with the regulator's actual standard within a single quarter. Conversely, a system that requires re-demonstration of every skill on every session ignores the regulator's own framing, which is that demonstrated competence persists for a defined window and then must be refreshed through specified recent experience or a specified check event.

What is needed is a training surface where each skill is a gated capability, each gate has explicit evidence requirements traceable to the airman certification standards or the operator's approved training program, each piece of evidence has a known decay function tied to the regulatory currency rule that governs it, and the AI tutor's behavior, what it offers the student, what it withholds, what it recommends to the human instructor, what it permits the student to attempt in a Level D simulator versus a BATD, is governed by the current state of those gates rather than by an undifferentiated pool of training content. The graph is the curriculum, and the graph is also the audit artifact.

Why Procedural Compliance Fails

The dominant procedural pattern for AI-assisted flight training today is content delivery with a logbook on the side. The AI tutor presents lessons, drills procedures, and runs scenario-based exercises. The student's performance is captured in a narrative form, sometimes with numerical scores, and the human instructor reviews the narrative before signing the logbook. This pattern fails the regulatory framework in three structural ways.

First, it does not gate. A student using a content-delivery tutor can attempt any maneuver in any sequence regardless of whether they have demonstrated the prerequisites. The tutor will obligingly run a holding pattern lesson for a student who has not yet demonstrated reliable altitude control, because the tutor has no model of what the student has earned. The Part 61 sequencing that exists in the regulator's mind exists nowhere in the tutor's behavior, and the Part 142 approved course outline that exists on paper at the training center is not enforced inside the tutor's session loop. Second, it does not produce evidence in a regulator-readable form. The narrative scores cannot be rolled up into the airman certification standards rubric without manual reconstruction by the instructor, which defeats the efficiency case for the AI tool in the first place and which leaves the FAA principal operations inspector unable to use the AI's output as primary evidence during a surveillance visit. Third, it does not detect regression. A student who passed a stall recovery exercise three weeks ago is presumed competent in stall recovery today, even though Part 61 currency rules and basic airmanship both reject that presumption, and even though the operator's approved training program may explicitly require re-demonstration after a defined interval.

Logbook integration alone does not fix this. A logbook records hours and endorsements; it does not record the structure of competence. A Part 142 training center's approved course outline is closer to the right shape, because it specifies the maneuvers, tolerances, and sequence, but the AI tutor that runs inside such a course is still typically a content layer rather than a gating layer, and the gap between what the course outline requires and what the tutor enforces is filled by the human instructor's discretion. That discretion is exactly what the regulator regulates, and exactly what an AI-augmented program is supposed to make more consistent rather than less. The procedural pattern produces hours and narrative; the regulatory pattern requires evidence and structure. Worse, the procedural pattern can produce a confident-sounding training record that makes the student look more prepared than they are, which is a known precursor to checkride failures and, in the worst cases, to inadequately trained pilots reaching the line.

What AQ Primitive Provides

Skill gating treats each pilot privilege as a capability that must be earned, evidenced, and maintained. The curriculum is expressed as a directed graph of capabilities: straight-and-level flight, coordinated turns, climbs and descents, slow flight, power-on and power-off stalls, ground reference maneuvers, normal and crosswind takeoffs and landings, emergency procedures, navigation by pilotage and dead reckoning, radio navigation, instrument scan, partial-panel flight, holding patterns, instrument approaches (precision and non-precision, coupled and hand-flown), upset prevention and recovery, crew resource management, threat and error management, and so on through the full Part 61 and Part-FCL syllabus. Each capability has an evidence schema drawn directly from the airman certification standards: the specific maneuver, the specific tolerances (altitude in feet, heading in degrees, airspeed in knots, bank in degrees), the specific conditions (day or night, VMC or IMC, simulated or actual instrument), and the specific judgment elements the examiner is expected to evaluate.

Each capability is initially locked. The AI tutor can teach toward an unlocked capability, can drill it, can simulate it, and can collect performance evidence, but it cannot certify it. Certification occurs when the evidence portfolio meets the schema and a qualified human, the instructor of record, the check airman, the designated pilot examiner, countersigns. Once certified, the capability unlocks the capabilities that depend on it: stall recovery becomes a prerequisite that has been satisfied, allowing slow flight and ground reference work to proceed; navigation competence unlocks cross-country planning; instrument scan unlocks partial-panel and approaches; basic instrument approaches unlock circle-to-land and non-standard approaches. The graph encodes the regulator's sequencing rather than relying on the tutor or the student to remember it, and the device authorization, BATD, AATD, FTD Level 5/6, FFS Level C/D, aircraft, is recorded against each evidence entry so that Part 142 credit allowances are honored automatically.

The decay function is part of the capability. Instrument approach competence decays on the Part 61.57(c) schedule. Takeoff and landing currency decays on the 61.57(a)(b) schedule. Part 121 recurrent training intervals are encoded as decay triggers for the corresponding capabilities, as are the EASA OPC and LPC six-month cycles. When a capability decays, the AI tutor's behavior changes: it surfaces the decay to the student and instructor, it adjusts the curriculum to include refresher work, and it withholds operations that depend on the now-decayed prerequisite. The student does not have to remember that they are out of currency. The system enforces the consequence the regulator already requires.

Anti-gaming is structural. Because evidence is bound to specific maneuvers performed to specific tolerances under specific conditions in specific devices, the student cannot accumulate credit by repeating easy variants in a low-fidelity device. Because human countersignature is required for certification, the AI cannot self-promote a student through gates. Because the evidence portfolio is the artifact, not a score, an examiner or check airman can audit the basis of any certification by reading the same record the system used to recommend it, and the FAA principal operations inspector can sample the record during surveillance the same way they sample paper records today.

Compliance Mapping

Each regulatory regime maps cleanly onto the gating structure. FAA Part 61 maneuvers and the airman certification standards become the evidence schemas for the corresponding capabilities, with the practical test standards defining the tolerances and the ACS task elements defining the judgment criteria. Part 61.57 currency rules become decay functions on the affected capabilities, executed automatically rather than tracked manually in a paper logbook. Part 121 and Part 135 recurrent training and proficiency check requirements become operator-specific capability sets layered on top of the Part 61 base, with their own intervals, their own evidence requirements, and their own approved-training-program references. Part 142 approved training course outlines map onto the curriculum graph, with simulator credit allowances expressed as which capabilities can be evidenced in which device level, so that an evidence entry collected in a BATD does not satisfy a requirement that calls for a Level D FFS.

EASA Part-FCL competency-based training and assessment maps onto the same capability graph with European-specific evidence schemas and the EBT competency framework's nine observable behaviors as cross-cutting evidence dimensions. ICAO Annex 1 floor requirements are encoded as the minimum capability set for international license recognition, allowing a license issued under one regime to be evaluated against another's requirements through graph comparison rather than narrative review. AC 60-28 English language proficiency becomes a cross-cutting capability gate on radio communications operations, with its own ICAO Level 4/5/6 decay schedule. MIL-STD-1797 flying qualities define the fidelity envelope inside which simulator-based evidence is admissible for given capabilities, formalizing what is currently negotiated case by case between training providers and inspectors. Part 117 fatigue rules constrain when training events can be scheduled and how their evidence is weighted, since a maneuver flown at the end of a long duty period is not equivalent to one flown rested, and the gating engine can refuse to count evidence collected outside acceptable fatigue conditions. The compliance mapping is not a translation layer bolted onto a content tutor; it is the native shape of the curriculum graph, and it is the same shape the regulator's surveillance program is already organized around.

Adoption Pathway

Adoption begins inside a Part 141 or Part 142 program where the approved course outline already provides the structure that the gating engine will encode. The first deployment is a single phase of training, typically the private pilot syllabus or an initial type rating, where the airman certification standards are well-defined, the device authorizations are stable, and the evidence schemas can be authored quickly from the existing course materials. The AI tutor runs inside this phase, collecting evidence against the gates, recommending readiness to the instructor of record, and producing the audit-ready portfolio that the chief instructor, the training center evaluator, and the FAA principal operations inspector can review during the next surveillance cycle. The first deployment is sized to be evaluated against an existing checkride pass-rate baseline, so that the gating engine's value is measurable in the same units the program is already accountable for.

The second phase extends across the full ab initio progression and into instrument and commercial training, and pulls in EASA Part-FCL evidence schemas where the school operates internationally or trains students who will be licensed under both regimes. This is the phase where the decay functions and regression monitoring earn their value, because students are now holding multiple capabilities at once, the device mix spans BATDs through FFSs, and the system has to track which capabilities are aging in which conditions. It is also the phase where the gating engine begins to produce surveillance-grade reporting, snapshots of program-wide competence distributions, decay-driven retraining demand, evidence-collection bottlenecks, that the chief instructor can use as a management tool and the FAA can use as a surveillance input.

The third phase integrates with Part 121 and Part 135 operator recurrent training, where the gating engine becomes the operator's qualification ledger: who is currently qualified for which fleet, which seat, which operation, with which evidence and which expirations, governed by the operator's FAA-approved training program and visible to the principal operations inspector on demand. At this scale the gating engine is no longer a training tool but the system of record for airman qualification, sitting alongside the operator's training records, feeding the FAA's Aviation Safety Information Analysis and Sharing pipeline, and supporting EASA's equivalent oversight where applicable. The AI tutor remains the delivery surface, but the gating engine is what makes its output regulator-grade, and the auditable graph of capabilities, evidence, decay, and signatures is what makes the entire AI-augmented training program defensible during a checkride, a 121.135 surveillance visit, or a post-incident investigation.