LLM and Skill Gating for Cybersecurity Skill Progression

Nick Clark

LLM and Skill Gating for Cybersecurity Skill Progression

by Nick Clark | Published March 27, 2026 | PDF

Cybersecurity AI agents require capabilities that are themselves dangerous: vulnerability scanning tools, exploit frameworks, traffic analysis capabilities, credential-handling functions, and incident response actions that can disrupt production systems and trigger reportable events. Providing an AI agent with full offensive and defensive cybersecurity capabilities from deployment creates the same risk as handing a novice analyst a fully loaded penetration testing toolkit and an unrestricted SOAR console on day one. The cybersecurity workforce framework the federal government already uses, the NIST NICE Framework, treats capability as the product of demonstrated knowledge, skills, and tasks performed in defined work roles. The NIST Cybersecurity Framework 2.0 treats security operations as a continuously governed function, not a static deployment. The NIST AI Risk Management Framework treats AI capability expansion as a governance event, not a configuration change. DoD Directive 8140 treats every cybersecurity work role as a qualified position with maintained currency. Skill gating applies progressive capability unlocking to cybersecurity agents, requiring demonstrated competence at each level before unlocking more powerful and potentially dangerous tools, with continuous regression monitoring that maintains skill currency as the threat landscape evolves and that produces evidence in the same shape the existing workforce frameworks already require for human practitioners.

Regulatory Framework

Cybersecurity is one of the most explicitly competence-graded professions in the federal regulatory ecosystem. The NIST NICE Framework, formalized in NIST Special Publication 800-181 Revision 1, decomposes cybersecurity work into 52 work roles, each defined by a specific set of knowledge statements, skill statements, and tasks, and each tied to competency areas that include both technical and non-technical dimensions. DoD Directive 8140, which superseded 8570, requires that every individual performing a cybersecurity work role for the Department of Defense be qualified for that role through a combination of foundational and resident qualifications, with continuing education to maintain currency. The same NICE structure flows through the federal civilian workforce under OPM cybersecurity coding and through the contractor workforce under FedRAMP and CMMC.

NIST Cybersecurity Framework 2.0 establishes the Govern, Identify, Protect, Detect, Respond, and Recover functions as a continuous lifecycle, with the Govern function explicitly added in the 2.0 revision to acknowledge that cybersecurity capability is itself a governance object. NIST AI RMF, published as NIST AI 100-1, treats AI systems as objects whose risk profile changes when their capabilities change, and requires that capability expansion be governed by an explicit risk management process rather than by the implicit decisions of deployment engineers. CISA Binding Operational Directives, FedRAMP authorization boundaries, and the SEC cyber incident disclosure rules under Item 1.05 of Form 8-K each treat the question of who or what can take which action against which system as a governed question, not a configuration detail.

Layered onto the federal stack are the industry certification regimes that the cybersecurity profession uses to operationalize competence: ISC2 CISSP, CCSP, and SSCP; EC-Council CEH and CHFI for offensive and forensic work; CompTIA Security+, CySA+, and PenTest+ for foundational and intermediate roles; SANS GIAC for specialized depth; and Offensive Security OSCP and OSEP for hands-on offensive certification. Each of these regimes is, in effect, a public statement that the profession does not trust unrestricted offensive or defensive capability without documented progression, and DoDD 8140 codifies this by requiring specific certifications for specific work roles. An AI cybersecurity agent that operates inside this perimeter must produce capability progression evidence that can be reasoned about in the same vocabulary the existing frameworks use, or it cannot be governed by the same workforce that governs the human practitioners working alongside it.

Architectural Requirement

The architectural requirement implied by this stack is that any AI agent participating in cybersecurity operations must expose its capabilities as a graph of gated work-role-aligned competencies, each with explicit evidence requirements, each with explicit unlock conditions, and each with explicit maintenance requirements tied to the threat landscape it actually faces. The graph must mirror the NICE Framework's work role structure closely enough that an organization's CISO, an auditor under FedRAMP or CMMC, or a DoD information system security manager can map the AI's capabilities to the same work-role taxonomy used to qualify the human staff. The graph must distinguish observational capabilities (log analysis, alert triage, threat intelligence correlation) from active capabilities (endpoint interrogation, network traffic capture, vulnerability scanning) from intrusive capabilities (credential testing, controlled exploitation, privilege escalation in authorized engagements) from disruptive capabilities (network isolation, process termination, account disablement, automated containment).

The system must also know what evidence has decayed against the threat landscape rather than against a calendar alone. Cybersecurity is unique in that a capability certified against last quarter's threat models may be silently inadequate against this quarter's, because the adversary actively iterates against the defender's capabilities. A skill currency model that decays only on time triggers will keep certifying agents whose detection logic no longer addresses the techniques cataloged in the most recent MITRE ATT&CK update or the most recent CISA Known Exploited Vulnerabilities additions. Conversely, a model that demands re-certification on every threat intelligence update is unworkable. What is needed is a graph in which each capability's currency is a function of both elapsed time and observed threat-landscape change, with the agent's behavior, what tools it can invoke, what scopes it can target, what disruptive actions it can take without human countersignature, governed by the current state of those gates rather than by an undifferentiated grant of authority. The graph is the agent's qualification record, the agent's authorization policy, and the audit artifact, all expressed as one object.

Why Procedural Compliance Fails

The dominant procedural pattern for AI in cybersecurity today is full-capability deployment with policy guardrails on the side. The agent is provisioned with access to the SIEM, the EDR, the SOAR, the vulnerability scanner, and possibly the offensive tooling, and a separate policy layer attempts to restrict what the agent can do based on prompts, role-based access controls on the underlying tools, and human-in-the-loop checkpoints for high-impact actions. This pattern fails the regulatory framework in three structural ways.

First, it does not gate by competence. The agent has access to vulnerability scanning on day one whether or not it has demonstrated that it correctly distinguishes test from production, that it respects scope boundaries, or that it correctly assesses the impact of its findings. The NICE Framework's logic, that capability follows demonstrated knowledge and skill, is inverted; the agent has the capability and the organization hopes the competence will catch up. Second, it does not produce evidence in a workforce-readable form. When a CISO is asked under SEC cyber disclosure rules whether the AI agent that participated in an incident response was qualified for the actions it took, the answer is a configuration screenshot, not a competence record. When a DoD ISSM is asked whether an agent operating in a DoDD 8140-governed environment is qualified for its work role, there is no qualification artifact to point to. Third, it does not track threat-landscape decay. An agent whose detection logic was validated against ATT&CK techniques as of last year continues to operate with the same authority against this year's techniques, even though its actual effectiveness against the current adversary may have degraded substantially.

Policy-layer guardrails do not fix this. Guardrails restrict the action surface; they do not establish that the agent is qualified to act inside it. A penetration testing agent constrained by an allow-list of target IPs is not therefore competent at penetration testing; it is merely scoped. An incident response agent with a human-approval checkpoint for isolation actions is not therefore competent at incident response; it is merely supervised. The procedural pattern produces scoped, supervised access; the regulatory pattern requires evidenced, qualified capability. The two are not the same, and an investigation following a misuse incident, whether by a CISA incident response team, a FedRAMP 3PAO, or a DoD cyber inspector, will distinguish them sharply.

What AQ Primitive Provides

Skill gating treats each cybersecurity capability as a work-role-aligned competency that must be earned, evidenced, and maintained against the live threat landscape. The curriculum is expressed as a directed graph of capabilities aligned to NICE Framework work roles: SOC analyst Tier 1 (alert triage, basic correlation, escalation), SOC analyst Tier 2 (investigation, evidence collection, hypothesis testing), threat hunter (proactive hypothesis-driven search across logs and telemetry), incident responder (containment, eradication, recovery actions), vulnerability assessor (authenticated and unauthenticated scanning, scope adherence, impact analysis), penetration tester (controlled exploitation, lateral movement, post-exploitation discipline), and so on through the work-role catalog. Each capability has an evidence schema drawn from NICE knowledge and skill statements, mapped where applicable to the certification objectives that the human workforce is qualified against (Security+, CySA+, CEH, OSCP), and tied to the specific tasks the work role performs.

Each capability is initially locked. The agent operates at the entry tier, demonstrating alert triage and threat intelligence correlation against a defined evaluation set drawn from real and synthetic incidents. Evidence accumulates as triage decisions, classifications, and escalations that can be evaluated against ground truth and against scope adherence criteria. When the evidence portfolio meets the schema and a qualified human, the SOC manager, the security architect, the red team lead, the ISSM, countersigns, the capability unlocks the next tier. Investigation unlocks vulnerability assessment, vulnerability assessment unlocks controlled exploitation in authorized scopes, controlled exploitation unlocks adversary emulation. Disruptive capabilities, network isolation, account disablement, automated containment, are gated separately and require additional evidence of impact-assessment competence and scope discipline, because the failure mode is not a missed detection but an outage induced by a false positive.

Threat-landscape-responsive decay is part of every capability. When new techniques are cataloged in ATT&CK, when new CVEs are added to the CISA Known Exploited Vulnerabilities catalog, when new actor TTPs are reported in CISA advisories or industry intelligence, the agent's certifications for affected capabilities are flagged for re-evaluation. The agent must demonstrate competent detection or response against the new technique, on a defined evaluation harness, before the certification is renewed. Time-based decay continues to apply, aligned to the recertification cycles of the corresponding human credentials (CISSP three years, OSCP three years, Security+ three years), but it is the threat-driven decay that addresses the failure mode unique to cybersecurity.

Anti-gaming is structural. Because evidence is bound to specific tasks performed against specific scopes with specific outcomes, the agent cannot accumulate credit by repeating easy variants. Because human countersignature is required at each tier promotion and because the countersigner is the qualified human responsible for the work role under DoDD 8140 or the equivalent civilian framework, the agent cannot self-promote. Because the evidence portfolio is the artifact, an investigator following an incident, whether internal, CISA, FedRAMP 3PAO, or DoD cyber inspector, can audit the basis of every action the agent took, in the same shape they would audit a human practitioner's qualification record.

Compliance Mapping

Each regulatory regime maps cleanly onto the gating structure. The NIST NICE Framework's work roles, knowledge statements, skill statements, and tasks become the evidence schemas for the corresponding capabilities, allowing the agent's qualification record to be enumerated in the same vocabulary the human workforce uses. NIST CSF 2.0's Govern function is satisfied by the gating engine itself: capability is a governed object, capability change is a governance event, and the audit log of unlock and decay events is the governance record. The Identify, Protect, Detect, Respond, and Recover functions map onto specific capability subgraphs, with the agent's authority within each function determined by its currently certified capabilities rather than by a static configuration.

NIST AI RMF's Govern, Map, Measure, and Manage functions map onto the gating engine's lifecycle: capability mapping is the curriculum graph, measurement is the evidence accumulation and threat-landscape evaluation, management is the unlock and decay logic, and governance is the human countersignature at each tier transition. DoDD 8140 work-role qualification requirements are encoded as the evidence schemas for the work-role-aligned capability subgraphs, with the certification-aligned evidence collected against the same competencies the human qualification process uses. ISC2, EC-Council, CompTIA, SANS GIAC, and Offensive Security certification objectives become the alignment references for the corresponding capability evidence, allowing the gating engine to produce a record that maps to the credential the organization is already familiar with. CISA Binding Operational Directives and Known Exploited Vulnerabilities catalog updates become decay triggers on the capabilities that address them. FedRAMP authorization boundaries and CMMC practice requirements become scope constraints encoded in the agent's governance field, enforced at action time rather than at policy review time. The compliance mapping is not a translation layer; it is the native shape of the agent's qualification record.

Adoption Pathway

Adoption begins inside a security operations center where the work-role structure already aligns with the NICE Framework and where the existing tier-based escalation pattern provides a natural template for the gating graph. The first deployment is the Tier 1 alert triage and correlation capability, where the agent operates against a defined evaluation harness drawn from the SOC's historical alert stream and where success is measured against ground truth labels the SOC team already produces during shift handoffs. The agent earns its Tier 1 certification through demonstrated triage accuracy, correct escalation behavior, and correct false-positive identification, countersigned by the SOC manager. This deployment is sized to be evaluated against existing SOC SLAs, mean-time-to-triage, false-positive rate, escalation accuracy, so that the gating engine's value is measurable in the same units the SOC is already accountable for under its operating-level agreements and under FedRAMP or CMMC reporting where applicable.

The second phase extends across active investigation, threat hunting, vulnerability assessment, and controlled incident response actions, layered onto the Tier 1 base. This is the phase where the threat-landscape-responsive decay earns its keep, because the agent now holds capabilities whose effectiveness varies as ATT&CK and CISA KEV evolve, and the decay-driven retraining demand becomes a visible operational signal. It is also the phase where the gating engine begins to produce surveillance-grade reporting that the CISO can use during board-level cyber-risk reporting and during SEC Item 1.05 disclosure preparation, snapshots of agent capability distributions, decay-driven retraining demand, and incidents in which the agent's action was within or outside its current certification.

The third phase integrates with red team and adversary emulation operations, where the gating engine ensures that the agent's offensive capabilities are proportional to its demonstrated competence and safety awareness, and with FedRAMP-authorized environments and DoDD 8140-governed workforces, where the gating engine becomes the AI workforce's qualification ledger sitting alongside the human workforce's. At this scale the gating engine is no longer a deployment tool but the system of record for AI cybersecurity qualification, feeding the same governance, risk, and compliance pipelines that already track the human staff. The AI agent remains the operational surface, but the gating engine is what makes its participation in cybersecurity operations defensible to a CISA incident responder, a FedRAMP 3PAO, a DoD cyber inspector, or an SEC examiner reviewing a cyber incident disclosure, on the same evidentiary terms those audiences already apply to the qualified human practitioners working alongside it.