Security and Drift Detection Layer

Nick Clark

Security and Drift Detection Layer

by Nick Clark | Published March 27, 2026 | PDF

A mandatory in-skill security layer comprising input sanitization, output filtering, and fan-out limiters; the disclosure expressly conditions skill executability on the presence and operability of this layer, such that any skill whose security layer is disabled, bypassed, or absent is structurally non-executable by the gating runtime.

Mechanism

The security layer is not an external firewall surrounding skills; it is a structural sub-component of the skill itself, declared in the skill manifest and instantiated by the gating runtime at the moment a skill descriptor is admitted into the agent's executable registry. Each skill, as defined in the cognition patent's skill-gating chapter, carries three obligatory security artefacts: an input-sanitization function, an output-filtering function, and a fan-out limiter. Absence of any one of these artefacts causes the gating loader to refuse registration; the skill is marked non-executable and any invocation request returns a structural-rejection token rather than an error string, preserving the audit invariant that no unprotected skill ever produced a side effect.

The input-sanitization function is invoked before the skill body is dispatched. It receives the raw argument bundle assembled by the planner, together with the calling agent's lineage handle and the active policy reference. Sanitization performs four sequential operations: type normalization against the skill's declared argument schema; provenance verification, in which every argument's lineage is checked for an unbroken chain back to a sanctioned origin; injection-pattern screening, which compares string-typed arguments against the active prompt-injection signature set; and resource-bound resolution, which clamps any numeric argument to the skill's declared operating range. Arguments that fail any check are not silently coerced; the entire invocation is aborted and the failing predicate is recorded.

The output-filtering function executes on every value the skill returns before that value is emitted into the agent's working memory or used to satisfy a downstream skill's input. Filtering is bidirectional: the function may strip fields that exceed the skill's declared egress schema, redact tokens that match secret-marker patterns, downgrade trust labels on values whose provenance the skill cannot certify, and refuse the entire output if any invariant is violated. The filtered output, together with a hash of the filter's decision trace, is what enters the agent's canonical state; the unfiltered output is discarded and is never persisted.

The fan-out limiter governs how many downstream invocations a single skill execution may trigger. Each skill declares, in its manifest, a maximum fan-out integer and an optional fan-out budget that is consumed across a session. The limiter is checked at the moment the skill attempts to enqueue a child invocation; if either the per-invocation maximum or the cumulative budget would be exceeded, the child invocation is refused and a fan-out-exhaustion record is written to lineage. This prevents a compromised or hallucinating skill from amplifying a single prompt into an unbounded cascade of side-effecting calls.

The three artefacts are bound together by the gating runtime's executability predicate. The predicate is evaluated each time the skill is selected for invocation, not merely at registration; a skill whose security layer has been disabled mid-session, whose sanitization function references a revoked policy version, or whose limiter budget has been exhausted is downgraded to non-executable and removed from the active skill set until the predicate is re-satisfied. This re-evaluation is the structural equivalent of a per-call security check, but performed declaratively against the skill's own manifest rather than imperatively in the calling agent.

Operating Parameters

The security layer is parameterized by the skill manifest's security_profile block, which a domain operator authors at skill registration. The block exposes the sanitization signature set version, the maximum permitted argument size, the egress schema reference, the secret-marker pattern list, the per-invocation fan-out maximum, and the cumulative session fan-out budget. Each parameter is policy-governed: the gating runtime resolves the active value by consulting the agent's policy reference at predicate-evaluation time, so a tightening of the global injection-signature set propagates to every skill on its next invocation without requiring redeployment.

The non-executability condition is itself parameterized. The manifest declares which artefacts are mandatory at the deployment tier in question; a research-tier deployment may permit a skill to omit the fan-out limiter so long as the egress schema remains, while a production tier requires all three. The gating runtime composes the deployment tier with the manifest's declared artefacts and refuses registration whenever a tier-mandatory artefact is absent. The composition is recorded so that an auditor can later reconstruct exactly which artefact set was demanded at the moment of registration.

Operating parameters also include the lineage-emission cadence. Each sanitization rejection, output redaction, and fan-out refusal emits a structured record into the agent's lineage at a configurable verbosity level, ranging from summary counts at the lowest level to full per-decision traces at the highest. The cadence is selected per deployment to balance audit fidelity against storage cost, and the gating runtime guarantees that at least the summary cadence is emitted irrespective of operator preference, ensuring a minimum forensic baseline.

Alternative Embodiments

In a first alternative embodiment, the input-sanitization function is composed of multiple chained sanitizers contributed by independent policy authorities, with the gating runtime requiring all sanitizers to accept before the argument bundle is admitted. This embodiment is suited to multi-tenant deployments in which each tenant contributes a tenant-specific sanitizer that runs alongside a platform-wide sanitizer.

In a second alternative embodiment, the output filter is replaced by a two-stage filter in which a fast structural filter executes synchronously and a slower semantic filter executes asynchronously, with the asynchronous stage able to retroactively quarantine an emitted value by issuing a lineage-rooted retraction. The retraction propagates to any downstream consumer that has not yet committed.

In a third alternative embodiment, the fan-out limiter is generalized into a resource-cost limiter that accounts for the estimated compute, network, and external-API cost of each child invocation rather than counting invocations directly. The skill manifest declares a cost-budget rather than a count-budget, and the limiter consults a cost oracle at enqueue time.

In a fourth alternative embodiment, the executability predicate is extended with a quorum requirement, such that a skill is executable only if a quorum of independent verifiers each report the security layer as intact. This embodiment is suited to safety-critical domains where unilateral attestation by the runtime is insufficient.

In a fifth alternative embodiment, the security layer's input-sanitization function is differentiated by call-site, with distinct sanitizer chains applied depending on which upstream skill or planner branch produced the argument bundle. The differentiation is declared as a routing table in the skill manifest, and the runtime selects the chain at the moment the bundle is admitted. This embodiment is appropriate where call-site context materially changes the threat model, for example when arguments originating from user-authored prompts must be sanitized more strictly than arguments synthesized by an internal planner.

In a sixth alternative embodiment, the fan-out limiter and the output filter share a unified post-emission ledger that records every emitted value and every enqueued child invocation, allowing a session-level controller to cancel pending children when emitted values are later retracted. The unified ledger ensures that a downstream consequence of an emitted value can be unwound in concert with the value's retraction rather than as a separate compensation action.

Composition with Other Mechanisms

The security layer composes with trust-weight calibration: a sanitization rejection or output filtering event is itself a calibration signal, contributing to the rolling history that adjusts the skill's trust weight. A skill whose security layer triggers frequently sees its trust weight tightened, raising the threshold its outputs must clear before downstream consumers accept them. The composition ensures that security and trust are not parallel but interlocking systems.

The security layer composes with the skill registry's executability index, which the planner consults when assembling candidate skill chains. Non-executable skills are filtered from candidate sets at planning time, so the security layer's effects are visible upstream of invocation. A planner that cannot find a chain of executable skills satisfying the goal returns a structural-infeasibility report rather than constructing a chain that will fail at the gate.

The security layer composes with capability-awareness mechanisms by treating the active capability envelope as an input to the sanitization signature set. Capabilities that have contracted, for example because a binding has lapsed, may strip permissive injection signatures from the active set, raising sanitization strictness automatically.

The security layer composes with arbitration among parallel skills: the arbitrator examines the security-decision traces of each candidate skill's emission and may prefer outputs from skills whose security layers reported clean traces over outputs from skills whose layers reported repeated near-misses, even where both candidates' values would individually pass downstream validators. The composition turns security-layer telemetry into an arbitration signal without requiring the arbitrator to re-run any sanitization logic.

Prior-Art Context

Conventional LLM tool-use frameworks treat security as an external concern: a model emits a tool call, an external validator inspects the call, and the call is forwarded or rejected. The validator is typically stateless, opaque to the audit trail, and cannot be reasoned about as a property of the tool itself. The disclosed mechanism inverts this arrangement by binding the security artefacts to the skill manifest and conditioning executability on their presence, so that security is a structural property of the skill rather than a runtime property of the calling environment.

Existing prompt-injection defenses focus on either pre-prompt filtering or post-generation classification, and treat the LLM as an opaque generator. The disclosed mechanism contributes a per-skill, manifest-bound layer that operates independently of the model and is therefore robust to model substitution, retraining, and prompt-format change.

Output redaction systems in conventional deployments operate at the transport layer or on persisted logs and cannot prevent a downstream skill from consuming an unredacted value. The disclosed output filter executes within the skill's own boundary and produces the only value that ever enters canonical state, eliminating this class of exposure.

Disclosure Scope

The disclosure encompasses any embodiment in which a skill within an LLM-skill-gating runtime is rendered structurally non-executable in the absence, disablement, or bypass of an in-skill security layer comprising at least input sanitization, output filtering, and a fan-out limiter, and in which the executability predicate is evaluated declaratively against the skill manifest both at registration and at each invocation.

The scope extends to embodiments in which the security artefacts are contributed by multiple policy authorities, in which output filtering is performed in synchronous and asynchronous stages with retroactive retraction, in which fan-out is governed by cost rather than count, and in which executability requires quorum attestation. The scope further extends to compositions with trust-weight calibration, planner executability indexes, and capability-envelope-driven sanitization parameterization.

The disclosure does not depend on a specific LLM, sanitizer implementation, or transport substrate; it is a structural arrangement of artefacts and predicates that admits arbitrary substitution of the underlying components so long as the executability invariant is preserved.