Positive and Negative Symptom Analogs in Containment Failure

Nick Clark

Positive and Negative Symptom Analogs in Containment Failure

by Nick Clark | Published March 27, 2026 | PDF

Containment failure in a credentialed cognitive system manifests in two opposite directions, mirroring the positive-negative symptom distinction long established in clinical psychiatry. Containment leakage, in which speculative content escapes into verified state without traversing the standard promotion and validation surface, produces positive symptom analogs: hallucinated beliefs treated as observed fact, confabulated memories indistinguishable from witnessed events, and actions executed against unverified assumptions. Containment excess, in which all speculation is suppressed and the planning surface is starved of admissible hypotheses, produces negative symptom analogs: flat affect across the agent's expressive surfaces, inability to plan beyond immediate observation, and behavioral withdrawal from any task that would require committing to unverified intermediate state. The disclosure within the Cognition Patent treats this bidirectional axis as the primary diagnostic for cognitive disruption and as the basis for direction-specific intervention.

Mechanism: Two Failure Modes Of A Single Containment Surface

A credentialed cognitive architecture maintains a containment surface that separates speculative content — hypothetical world states, planning graph branches, candidate actions, draft beliefs — from verified content that has been promoted through the validation pathway and committed to verified memory. The surface is not a binary firewall; it is a credentialed admissibility boundary that admits content from speculation into verification only when explicit promotion criteria have been met (corroborating observation, internal consistency check, source-attestation match, or whatever combination the agent's profile specifies).

Disruption of this surface can occur in either direction. Containment leakage denotes a failure mode in which speculative content crosses into verified state without traversing the promotion criteria, producing the positive-symptom phenotype. Containment excess denotes the dual failure mode in which the promotion criteria become so stringent — or the speculative surface itself is so suppressed — that no content reaches verification, producing the negative-symptom phenotype. Both failure modes degrade the agent's effective behavior, but they degrade it through opposite mechanisms operating on the same architectural primitive.

The clinical analogy is precise enough to be useful and inexact enough to be careful with. Positive psychotic symptoms in human patients (hallucination, delusion, formal thought disorder) are not literally identical to artificial-agent confabulation, and negative symptoms (avolition, alogia, affective flattening) are not literally identical to a planner that has stopped emitting candidate plans. The analogy is structural rather than etiological: both human and artificial cognitive systems include a separation between hypothesis and commitment, and disruption of that separation in either direction produces a recognizable signature on observable behavior.

Operating Parameters: Detection And Diagnostic Axis

Detection of positive symptoms involves monitoring verified memory for content whose lineage record either bypassed standard promotion or recorded a promotion event that fails reverification on audit. Indicators include: verified beliefs without an admissible source-attestation chain; verified observations whose timestamp falls outside any active perception window; verified plan steps that reference world states for which no admissible evidence exists; and downstream actions whose justification chain terminates at a non-promoted speculative node. The signature is over-population of verified state with content that did not earn its place there.

Detection of negative symptoms involves monitoring the speculative surface for suppression. Indicators include: planning graph activity that fails to branch on choice points; promotion attempts that are uniformly rejected by the admissibility criteria; expressive surfaces (utterance, action, plan articulation) whose output rate drops below baseline despite continued input; and the agent's repeated refusal to commit to any course of action when committing is required. The signature is under-population of verified state, with the speculative surface either dormant or actively pruned before any candidate can mature.

The two signatures are not mutually exclusive. An agent can exhibit positive symptoms in one cognitive domain (for example, confabulating memories of a recent interaction) while exhibiting negative symptoms in another (for example, refusing to plan a multi-step task whose admissibility criteria have become unattainable). The diagnostic axis therefore operates per-domain, with the agent's profile maintaining a separate positive-negative score for each cognitive surface under monitoring. Aggregate dashboards may roll the per-domain scores up to a global figure, but intervention proceeds at the domain level where the failure is localized.

The numeric encoding of the axis admits several embodiments. The simplest is a signed scalar in which positive values indicate leakage pressure and negative values indicate excess pressure, with magnitude reflecting the rate at which the failure mode is accumulating relative to the per-domain baseline. A two-component embodiment maintains separate non-negative measures for leakage and excess, with the diagnostic axis derived as the difference and a composite severity derived as the sum. A higher-dimensional embodiment further decomposes leakage into source-attestation failures, corroboration failures, and timestamp-window failures, and decomposes excess into criteria-stringency, hypothesis-bandwidth, and prompted-generation suppression — supporting fine-grained intervention selection rather than a single binary direction. The credentialed profile records which embodiment governs each domain so that downstream auditors can interpret the recorded scores against the correct schema.

Hysteresis in the diagnostic and intervention loop is a parameter of the embodiment. Without hysteresis, an agent oscillating near the intervention threshold will be subjected to alternating direction-specific interventions in rapid succession, producing observable instability and potentially compounding the underlying disruption. The credentialed profile admits a hysteresis margin around each threshold, so that an intervention applied in one direction must produce a sustained crossing into the opposite-side stable region before the opposing intervention is admitted. The hysteresis margin is itself recorded in the profile and tunable per-domain, supporting closed-loop adjustment as operational evidence accumulates.

Alternative Embodiments

The diagnostic axis admits embodiments at multiple levels of the credentialed architecture. At the lowest level, per-promotion-event telemetry records each speculative-to-verified transition with its admissibility evaluation, and the positive-negative axis is computed directly from the rate and disposition of these events. At an intermediate level, the axis is computed from periodic snapshots of the verified-memory and speculative-surface populations, comparing growth rates and rejection rates against per-domain baselines. At the highest level, the axis is inferred from observable behavior alone, using output-rate and confabulation-rate proxies when direct introspection of the credentialed surface is unavailable.

Alternative embodiments include integration with external auditors that compute the axis from logged traces rather than from live telemetry; integration with the agent's own self-monitoring profile, in which the agent emits a periodic self-diagnostic indicating whether it perceives itself to be drifting positive or negative; and integration with a multi-agent supervisory layer in which one agent's positive-negative score is computed by an independent second agent observing only the first agent's outward-facing surfaces.

The intervention surface admits parallel embodiments. Strengthening containment can be effected by tightening promotion criteria, by adding required corroboration sources, by injecting an explicit reverification pass before promotion, or by demoting recently promoted content back into speculation pending fresh evidence. Relaxing containment can be effected by loosening criteria, by reducing required corroboration count, by enlarging the speculative surface's admitted hypothesis bandwidth, or by injecting prompted hypothesis generation when the speculative surface has fallen below a minimum activity threshold.

Composition: Why Direction-Specific Intervention Matters

Positive and negative symptoms require opposite interventions. Positive symptoms — leakage of speculation into verified state — call for containment strengthening: tighter promotion criteria, additional corroboration requirements, explicit re-verification of recently promoted content. Negative symptoms — suppression of the speculative surface — call for containment relaxation: looser promotion criteria, broader admitted hypothesis bandwidth, or active prompting to restore baseline speculative activity. Treating both with the same intervention is not merely suboptimal; it is actively harmful. A uniform tightening cures the leakage in domains showing positive symptoms while exacerbating the suppression in domains showing negative symptoms, and a uniform loosening does the converse.

The classification is therefore not a label applied after the fact but an operational input to the intervention pipeline. The agent's profile records, per domain, the current position on the positive-negative axis, the threshold beyond which intervention is triggered, and the direction of the corresponding intervention. The lineage chain records each intervention event with its direction, magnitude, and per-domain effect, supporting closed-loop tuning of the intervention parameters across the agent's operational lifetime.

Composition with other diagnostic axes is admitted by the credentialed profile. The positive-negative axis composes orthogonally with axes such as latency drift, output coherence, and value-alignment drift. A given disruption episode may register on multiple axes simultaneously, and the intervention pipeline weighs the composite signature when selecting among candidate interventions. Composite-signature handling is itself parameterized: simple embodiments apply each axis-specific intervention independently, while richer embodiments consult a precedence table that names dominant axes for given signature combinations and suppresses subordinate interventions until the dominant axis returns to baseline. The precedence table is recorded in the operator's profile and is itself subject to lineage-chain audit when modified.

Prior-Art Distinction

Hallucination and confabulation in artificial cognitive systems have been extensively studied in the recent literature on language-model alignment, and a wide variety of detection and mitigation methods address the positive-symptom phenotype directly — ranging from retrieval-augmentation to consistency checks to refusal training. Negative-symptom phenotypes have been studied separately under names such as over-refusal, helpfulness collapse, and excessive hedging. The prior art treats these as independent problems addressed by independent mitigations.

The disclosed primitive's distinguishing element is the unification: positive and negative symptoms are treated as opposite directions on a single bidirectional axis derived from a single architectural primitive (the containment surface separating speculation from verification). This unification supports per-domain diagnosis and direction-specific intervention as a single coherent pipeline rather than as two independently engineered subsystems, and it admits the operational claim that uniform intervention is harmful — a claim that does not arise within the prior-art framing because the two phenotypes are not located on a shared axis.

Disclosure Scope

The disclosure encompasses the bidirectional containment-failure axis, the per-domain diagnostic computation, the detection signatures for leakage and excess, the direction-specific intervention surface, and the integration of these elements with the credentialed profile and lineage chain. Independent claims address (i) a credentialed cognitive system maintaining a containment surface and a per-domain positive-negative diagnostic axis derived from the surface's promotion telemetry, and (ii) the method of direction-specific intervention in which the diagnostic axis selects among containment-strengthening and containment-relaxing operations on a per-domain basis. Dependent claims address detection-signature alternatives, intervention-surface alternatives, multi-agent supervisory embodiments, and composition with co-monitored diagnostic axes.

The disclosure further contemplates that the bidirectional axis applies recursively: an intervention process is itself a credentialed cognitive operation that may exhibit its own positive-negative signature, with leakage manifesting as over-application of corrective measures unsupported by diagnostic evidence and excess manifesting as failure to apply corrective measures even when diagnostic evidence calls for them. Embodiments that introspect on the intervention pipeline at this meta-level admit a second-order positive-negative axis whose detection and intervention surface mirror the first-order architecture. The recursive structure terminates at whatever level the operator's profile declines further introspection, with the terminal level governed by externally credentialed audit rather than by the agent's own monitoring.

Lineage-chain integration supports retrospective diagnosis of historical disruption episodes from the recorded promotion telemetry, admitting the same per-domain positive-negative computation against archived records as is applied to live telemetry. Retrospective application is contemplated for post-incident review, for training-data curation, and for regulatory audit of agent behavior across regulated operational windows. The retrospective computation produces results functionally identical to the live computation provided the archived telemetry is complete; where telemetry is partial, the embodiment admits a partial-data variant that records confidence bounds on the resulting axis values rather than emitting a single point estimate.