Evidence-Based Capability Gating

Nick Clark

Evidence-Based Capability Gating

by Nick Clark | Published March 27, 2026 | PDF

A skill is not unlocked because a model claims competence; it is unlocked because accumulated, signed evidence — prior performance traces, governance review records, and outcomes from scoped pilots — clears policy thresholds bound to that specific capability. The evidence record is structured, addressable, and audit-bound, so the act of granting a capability is itself a reviewable artifact rather than a configuration toggle.

Mechanism

Evidence-based capability gating is a deterministic admission function that sits between an inventory of latent skills and the runtime surface that exposes them to an agent. Each skill — for example, "draft outbound legal correspondence," "execute multi-step refunds against a billing API," or "author SQL against a production replica" — is registered with a gating descriptor. The descriptor enumerates the categories of evidence that must be present, the minimum sample sizes within each category, the freshness window over which evidence remains admissible, and the credentialing authorities whose signatures count toward the gate. The function returns admit, defer, or deny; admit is the only outcome that mutates the runtime capability surface, and even admit is qualified by the scope conditions under which the evidence was produced.

Evidence enters the gating store through three distinct pathways. The first pathway is performance evidence: structured traces emitted by prior task executions, each carrying input class, output, validator outcome, latency, escalation status, and the policy version under which the execution occurred. The second pathway is governance evidence: review records produced by human reviewers, oversight panels, or automated red-team harnesses, each bound to a specific skill identifier and signed by the reviewing authority's credential. The third pathway is pilot evidence: outcomes from scoped, low-blast-radius deployments in which the candidate skill was exercised against a restricted operator population, a synthetic environment, or a shadow-mode replay of production traffic. Each pathway produces a typed evidence record; the gating function evaluates the records against the descriptor without re-deriving the underlying observations.

The function is structurally separated from the model that produces candidate outputs. The model does not write to the evidence store, does not read the gating descriptor, and cannot represent itself as having satisfied the gate. The store is append-only and signed: a record cannot be retroactively altered to make a skill appear qualified, and a revoked credential propagates through the store such that records signed by the revoked authority are demoted in subsequent gate evaluations. The model's role is bounded to producing candidate outputs within the surface that the gate has already authorized; the gate itself is computed in a separate process whose inputs are records, signatures, and policy, not natural-language claims.

When the gating function admits a skill, it emits an admission token that names the skill, the evidence record set considered, the policy version that produced the decision, and the scope envelope under which the admission applies. Downstream policy uses the token rather than re-running the evaluation, which keeps per-call latency bounded and makes the admission itself a citable artifact in audit. When the function denies or defers, it emits a structured rationale: which evidence categories were short, which records were stale, which signatures were missing or revoked. The rationale is consumable by the operations layer that schedules further pilots or governance review, so the loop from "skill not yet qualified" to "skill qualified" is itself instrumented.

Operating Parameters

The descriptor exposes a small set of parameters per skill. Evidence breadth specifies the minimum number of distinct evidence categories that must contribute records; a descriptor that requires breadth of three cannot be satisfied by performance traces alone. Evidence depth specifies the minimum count within each required category, expressed either as raw record counts or as effective sample size after deduplication of near-identical inputs. Freshness windows are expressed as durations relative to the gate evaluation timestamp; records older than the window are excluded without being deleted, so a skill can be re-qualified by accumulating fresh evidence rather than by manipulating retention.

Authority weighting maps each credentialing authority to a contribution factor used when the descriptor requires governance evidence. An internal reviewer credential may contribute at unit weight; an external auditor credential may contribute at a higher weight; a self-asserted credential contributes at zero. Weights interact with depth thresholds rather than replacing them: a single high-weight signature does not by itself satisfy a depth requirement of, for example, three independent reviews. Scope conditions name the operator populations, data classes, and runtime environments under which the admission applies; an admission qualified for "internal staging tenants only" does not transfer to production tenants without a separate evaluation.

Decay parameters control how the gate behaves as evidence ages. A decay schedule may keep an admission in force for a fixed dwell period and then require a lighter-weight refresh; alternatively, a decay schedule may continuously discount older records so that a skill silently slides from admit to defer if no new evidence is accruing. Defer behavior is itself parameterized: a deferred skill may remain exposed in read-only or shadow mode, may be exposed only to a narrowed operator class, or may be withdrawn entirely until refreshed. The parameter set is small enough to reason about and rich enough to express the operationally meaningful gradations between "fully qualified" and "withdrawn."

Alternative Embodiments

The evidence store can be embodied as a local append-only log co-located with the agent runtime, as a shared ledger across a fleet of agents that pool evidence under a common credentialing authority, or as a federated store in which each tenant retains custody of its evidence and the gating function operates over a verifiable view rather than the raw records. The federated embodiment is appropriate where evidence cannot leave a tenant boundary; the shared-ledger embodiment is appropriate where pooled evidence is the point, for example in a regulated industry consortium qualifying a common skill set.

Pilots can be embodied as production shadow runs that compare candidate outputs against a held-out incumbent, as synthetic harness runs against curated input distributions, or as bounded live deployments to opt-in operator cohorts. The descriptor does not need to know which embodiment produced a pilot record; it consumes the record's typed fields. This separation lets organizations evolve pilot infrastructure without rewriting gating policy.

Governance evidence can be produced by human review panels, by automated red-team harnesses whose findings are signed by the harness's credential, or by external regulatory attestation. Hybrid embodiments are common: a harness produces a draft finding, a human reviewer countersigns or overrides, and the resulting record carries both signatures with weights that the descriptor combines. The admission token can also be embodied as a short-lived bearer credential, as a long-lived but revocable certificate, or as a per-call attestation that the runtime re-checks against a published revocation list.

Composition With Adjacent Mechanisms

Evidence-based gating composes with the validation pipeline that governs individual proposals at runtime. The gate determines whether a skill may be invoked at all; the validation pipeline determines whether a specific invocation's output is admissible into agent state. The two are intentionally separated: a fully gated skill can still produce a non-admissible output on a given call, and the gate is not weakened by individual rejections so long as the aggregate evidence remains within policy. Conversely, sustained validation failures within a skill's traces feed back into the evidence store as negative performance evidence, which can move the skill from admit to defer without any human action.

Gating composes with the feedback-asymmetry mechanism described in the companion disclosure: positive evidence accrues at a rate-limited cadence that the descriptor controls, while negative evidence — revocations, governance dissent, validator regressions — acts on the gate immediately. Composition with operator-intent mechanisms allows the scope envelope on an admission to reference operator classes whose intent confidence has itself been governed; a skill admitted only for high-confidence first-party operators will not surface for low-confidence inferred operators.

Distinction From Prior Art

Conventional capability management in language-model systems treats access as a configuration concern. A skill, tool, or function is enumerated in a manifest; the manifest is loaded; the model is prompted with the manifest; the model selects from the manifest at inference time. Whether the model is competent at the listed capability is, in this pattern, a property assumed by the deployer at manifest-authoring time. The deployer may run evaluations before shipping the manifest, but the evaluations are not bound to the manifest, are not signed, and are not re-checked at gate-evaluation time. There is no structural artifact that ties a specific authorization to a specific evidence record set.

Role-based access control systems do bind authorizations to credentials, but the credentials in conventional RBAC name the requesting principal, not the evidence that the granted capability has been demonstrated. A user with the "refund issuer" role is authorized regardless of whether the agent acting on the user's behalf has any track record of issuing refunds correctly. Evidence-based capability gating shifts the locus of authorization from "who is asking" to "what has been demonstrated, by whom, under what scope, signed by which authority." It is closer in spirit to clinical credentialing or aviation type ratings than to file-system permissions: the artifact authorizing the activity references the record of qualification, not merely the identity of the actor.

Disclosure Scope

Operationally, the mechanism is invoked at three distinct moments. It is invoked at deployment time, when a new agent build registers its skill inventory and the runtime computes which skills the agent is currently authorized to expose. It is invoked at policy-update time, when a descriptor is amended — for example, raising the depth requirement or adding a new evidence category — and previously admitted skills must be re-evaluated against the new descriptor; admissions that no longer satisfy the descriptor transition to defer or deny without runtime intervention. It is invoked on a scheduled cadence, so that ambient changes such as evidence aging out of freshness windows or signing authorities being revoked propagate into the surface without requiring a precipitating event. The same deterministic function services all three invocation modes; the surface state is therefore at every moment a function of (descriptor, evidence store, signature state, clock) and nothing else.

The disclosed mechanism encompasses the gating descriptor format, the typed evidence record schema across performance, governance, and pilot pathways, the deterministic admission function and its admit/defer/deny outputs, the admission-token artifact and its scope envelope, the rationale artifact emitted on defer or deny, and the feedback paths by which runtime traces re-enter the evidence store. The mechanism is independent of any specific underlying language model, of any specific evaluation harness, and of any specific credential format; the descriptor is the structural commitment, and equivalent embodiments that preserve the typed record set, the deterministic evaluation, and the signed admission are within scope. The mechanism does not claim novelty in language modeling itself; it claims novelty in the structural binding between accumulated, signed evidence and the runtime exposure of model-driven capability.