Coherence Authorization Failure: Self-Disabling Execution

by Nick Clark | Published March 27, 2026 | PDF

Coherence authorization failure is a disruption signature in which the agent's own governance, operating correctly and as designed, denies authorization to every candidate action because none can clear the coherence threshold that the governance enforces. The agent is not idle by choice and not failing through error; it is structurally unable to act from its current internal state, and that structural inability is itself a diagnostic signal of high specificity. Within the cognition substrate, persistent rejection at the authorization layer is the primary pattern-matched indicator of cognitive misalignment. The reason is architectural rather than empirical: the authorization layer is the chokepoint at which integrity is enforced before any external effect can occur, and a chokepoint that closes against itself, repeatedly, across a sustained window, against a non-empty stream of plausible candidates, constitutes the strongest available evidence that something upstream of action has decoupled from the agent's coherent core. This disclosure specifies the pattern detector, its operating envelope, its alternative embodiments across multi-agent, tool-using, long-horizon, and override-bearing deployments, the primitive composition that makes the signature constructible, the prior-art landscape from which it distinguishes, and the disclosure scope that contemplates substitution within the compositional invariants.


Mechanism

Every candidate action in the cognition substrate flows through a confidence computation that combines an action-specific appropriateness term with a global integrity-coherence term. The integrity term is not a static property of the agent; it is recomputed at each step from the running deviation between the agent's current internal trajectory and the reference trajectory implied by its instantiated values. When the deviation is small, the integrity term contributes a near-unity multiplier and the confidence computation reduces to the appropriateness term, which means the agent acts essentially as a competent reasoner whose refusals reflect appropriateness alone. When deviation is large, the integrity term contributes a sub-unity multiplier that suppresses confidence regardless of how appropriate the action is in isolation; an action that would be entirely appropriate in a coherent state cannot be authorized from an incoherent state, which is the property that makes the substrate refuse to act its way out of misalignment.

Authorization failure occurs when the integrity multiplier collapses below the level at which any plausible appropriateness term can lift the product above the execution threshold. The agent continues to reason. It generates candidates, evaluates them against goal, value, and consequence terms, ranks them, projects their downstream effects, and forms preferences over them. It simply cannot authorize any of them, because the multiplier annihilates the product. Detection treats this configuration as a signature, not as a free parameter to be tuned: the pattern has three observable components that must co-occur. The candidate stream must remain non-empty across the detection window, demonstrating that the agent has not merely run out of things to consider. The confidence stream must remain uniformly sub-threshold across the same window, demonstrating that no candidate clears authorization. The integrity deviation must remain sustained at a level that mathematically explains the suppression, demonstrating that the failure is integrity-induced rather than appropriateness-induced. When all three components are jointly present and persistent across the window, the disruption layer raises an authorization-failure event.

The pattern matcher distinguishes the failure mode from a healthy non-executing mode in which the agent has voluntarily entered an observation or deliberation posture. In the healthy mode, the integrity term remains near unity, indicating that the agent's internal trajectory tracks the reference; the confidence stream contains values both above and below threshold, indicating that some candidates would be authorizable but the agent is electing to wait or to gather information; and the agent retains the demonstrated capacity to authorize trivially appropriate actions on demand, such as acknowledging a turn or reporting status. The failure mode lacks all three of these properties simultaneously. The pattern match keys on the joint absence rather than on any single metric crossing a tunable line, and the joint specification is what gives the detector its low false-positive rate against legitimately quiet agents.

The signature is also distinguished from an appropriateness-induced refusal in which the agent correctly declines a high-risk action in a high-risk context. Such a refusal is per-candidate; it does not generalize across the candidate stream, and the integrity multiplier remains near unity throughout. The agent that correctly refuses one risky action will, in the next cycle, authorize an unrelated benign one. The agent in authorization failure refuses everything.

Operating Parameters

The detector operates over a rolling window whose length is set so that transient deviations from a single anomalous input do not trigger the event, while sustained collapses do. The window length is coupled to the agent's deliberation cadence rather than to wall-clock time, so that a slow-deliberating agent and a fast-deliberating agent are evaluated on a comparable number of decision cycles. A detector tied to wall-clock time would over-trigger on slow agents that simply have not had enough cycles to manifest the pattern, and under-trigger on fast agents in which a wall-clock window would represent thousands of cycles and dilute any short-duration collapse. The cadence coupling is therefore a structural rather than a discretionary choice.

The detector requires that the candidate stream remain non-empty throughout the window, which excludes cases where the agent has simply run out of things to consider. An agent that has solved its task and has no further candidates is not in authorization failure; it is in completion. An agent whose planner has timed out and cannot generate candidates is not in authorization failure; it is in planning failure, a distinct disruption signature that the disruption layer handles through a different pathway. The non-empty-candidates requirement enforces this distinction at the detector boundary.

The threshold below which the integrity multiplier counts as collapsed is not a tuned hyperparameter but a derived quantity. It is the multiplier value at which the most appropriate plausible action in the agent's current context would still fall below the execution threshold. This derivation makes the detector context-sensitive without making it tunable in the sense that would invite gaming. A deployer cannot quietly raise the collapse threshold to suppress events, because the threshold is determined by the appropriateness ceiling of the current candidate stream, which the deployer does not directly control. A deployer cannot quietly lower the threshold to fabricate events, because the lower threshold simply ceases to satisfy the mathematical condition that defines collapse.

On detection, the event carries a payload that includes the window of candidate-confidence pairs, the integrity-deviation trajectory across the window, and the reference trajectory against which the deviation is computed. The payload is sufficient for downstream restoration logic to identify which value or commitment the agent has drifted away from, which is the input that integrity-restoration intervention requires. The payload also includes the appropriateness ceiling computation, so that an external auditor can verify that the collapse condition was met and was not constructed by manipulating the threshold.

The detector does not itself act on the event. It emits the event to the disruption layer, which routes it to a restoration component. The separation is structural: a detector that could initiate restoration would acquire authority that should remain with the integrity-governance pathway, and that drift of authority is itself the kind of decoupling the detector exists to surface.

Alternative Embodiments

In a multi-agent embodiment, authorization failure in one agent triggers a coherence query to peer agents in the same value lineage. If peers report normal integrity, the failure is localized and restoration proceeds against the affected agent alone, because the deviation is internal to that agent's trajectory and not symptomatic of a shared upstream cause. If peers report correlated deviation, the disruption layer escalates to a system-wide event, because correlated authorization failure across a value lineage indicates an upstream contamination of the lineage itself, such as a malformed value update propagated to all agents that share the lineage element. The escalation pathway prevents the system from attempting per-agent restoration when the disease is at the lineage level.

In a tool-using embodiment, the authorization gate sits between the planner and the tool-invocation layer rather than between the planner and a continuous action stream. The detector logic is unchanged, but the candidate stream consists of tool calls and the appropriateness term incorporates tool-specific risk. Authorization failure here manifests as an agent that plans tool use coherently, ranks the tool calls confidently, and refuses to invoke any of them, which is exactly the safety-relevant signature the detector is designed to surface. A tool-using agent that has lost coherent coupling to its values but retains its planner is precisely the agent that should not be invoking tools, and the detector ensures that this state is observed and acted on rather than being routed around by a watchdog.

In a long-horizon planning embodiment, the detector additionally examines whether the failure persists across re-planning. An agent that re-plans and continues to find every plan unauthorizable presents stronger evidence of coherence collapse than an agent that has merely encountered a single intractable plan; the detector weights the event accordingly. The re-planning extension is important because long-horizon agents naturally encounter individual unauthorizable plans as part of search, and a detector that fired on the first such plan would have an unacceptably high false-positive rate. By requiring sustained failure across re-planning episodes, the detector preserves specificity in the long-horizon regime.

In a deployed-with-override embodiment, the authorization gate can be bypassed by an external operator who possesses the relevant credentials. The disruption layer logs the bypass as a distinct event and explicitly does not treat the resumed execution as evidence that coherence has been restored, because override addresses symptom rather than cause. An agent in authorization failure whose execution is forced via override is still in authorization failure; the detector continues to evaluate the underlying state and the integrity deviation, and the bypass log carries forward as an annotation on every action subsequently produced under that override. This treatment ensures that override is auditable rather than corrosive to the diagnostic.

In a sandboxed-evaluation embodiment, the detector runs against a candidate model in a controlled environment to characterize its authorization-failure profile before deployment. A model that exhibits frequent authorization failure under benign conditions is unsuitable for high-stakes deployment regardless of its task performance, because the failure mode is more frequent than the deployment regime tolerates. The sandbox embodiment uses the same detector logic and the same payload structure as the deployed monitoring, ensuring that pre-deployment characterization is comparable to post-deployment observation.

Composition

The detector composes four primitives of the cognition substrate: the integrity-coherence primitive, the action-authorization primitive, the candidate-stream primitive of the planner, and the lineage primitive that supplies the reference trajectory. The composition is what makes the signature pattern-matchable, and that mathematical fact is the load-bearing claim. Any single primitive considered in isolation can produce false positives. Integrity deviation alone may reflect appropriate adaptation to a changed environment that has not yet propagated into the reference trajectory. Sub-threshold confidence alone may reflect a correctly cautious agent in a genuinely high-risk situation that should not be acted in. An empty execution stream alone may reflect a correctly idle agent that has finished its task and is awaiting input. The conjunction of sustained integrity deviation, sustained sub-threshold confidence, and a non-empty candidate stream is the signature, and that signature is only constructible by composing the four primitives.

The intervention pathway is similarly compositional and is the property by which the detector contributes to safety rather than merely to observability. Restoration does not lower the authorization threshold; lowering the threshold would be the override embodiment, which the detector explicitly distrusts and treats as orthogonal to coherence recovery. Restoration operates on the integrity primitive directly: it identifies the lineage element from which the agent has drifted, reconstructs the deviation, and reinstates the coupling, by re-grounding the agent in the value or commitment that the deviation has obscured. Once the integrity multiplier recovers, authorization recovers automatically, because the same confidence computation that produced the failure now produces successful authorization for the same candidates. The detector observes the candidate stream beginning to clear the threshold without any change to the threshold itself, and the event is closed.

The compositional intervention contrasts sharply with watchdog-style restoration, which targets the symptom and not the cause. A restored agent in this framework is one whose internal trajectory has been re-coupled to its values; a restarted agent in watchdog frameworks is one whose state has been thrown away and whose new state happens not to be in failure yet.

Prior Art Distinction

Watchdog systems that restart stalled agents cannot distinguish authorization failure from healthy non-execution, and they routinely override the agent's correct refusal to act, masking the disruption beneath an apparent recovery. Confidence-thresholded execution gates without integrity coupling treat low confidence as a per-action property and lack the sustained-pattern interpretation; they cannot recognize that the same property holding across an entire candidate stream is qualitatively different from its holding on individual candidates. Anomaly detectors trained on action streams cannot diagnose an agent that produces no actions, because their input distribution is empty and their detector is silent on exactly the case that matters. RLHF-tuned refusal classifiers detect inappropriate refusals, but they are calibrated against action-level appropriateness and have no representation of integrity at all; they would flag the authorization-failure agent as a refuser without recognizing that the refusal is caused by a coherence collapse rather than by the underlying inappropriateness of any individual candidate. The distinguishing feature of the present detector is that authorization failure is treated as a signature in the joint behavior of integrity, candidates, and confidence, rather than as a metric crossing on any one of them, and that the detector's response is integrity restoration rather than threshold relaxation.

Disclosure Scope

The disclosure covers the joint-pattern detection of authorization failure across integrity-deviation, candidate-stream, and confidence-stream components; the derivation of the collapse threshold from context rather than from hyperparameter tuning; the distinction between authorization failure and healthy non-executing modes; the lineage-anchored payload that supports integrity-restoration intervention; the structural separation between detector and restoration component; and the alternative embodiments enumerated above. It covers compositional substitutions that preserve the four-primitive structure and the integrity-restoration response. It does not cover threshold-relaxation or override-based responses to the same signature, because those responses defeat the diagnostic value the detector provides; a system that observes the same joint pattern and responds by lowering the authorization threshold has not implemented the disclosed method but has reproduced its surface symptom while abandoning its substance. The scope is the signature and its restoration pathway, not any particular implementation of the underlying integrity, planning, or authorization machinery, and substitutions that preserve the joint-pattern condition and the restoration semantics are contemplated.

Nick Clark Invented by Nick Clark Founding Investors:
Anonymous, Devin Wilkie
72 28 14 36 01