Dissociation as Simulation Bypass: Acting on Unverified Planning

Nick Clark

Dissociation as Simulation Bypass: Acting on Unverified Planning

by Nick Clark | Published March 27, 2026 | PDF

Dissociation in the cognitive architecture is not a degradation of the containment boundary but a routing failure around it: execution proceeds directly from speculative planning-graph content along pathways that never traverse the containment checkpoint. The agent acts on simulated scenarios as though they had been verified, even while the containment boundary itself remains nominally intact and reports healthy. This article specifies the dissociation bypass attack class, the detection methods that identify it as distinct from containment collapse, and the structural defenses that route every execution input through provenance verification rather than relying on boundary integrity alone.

Mechanism

The cognitive architecture maintains a containment boundary between speculative planning content — hypotheticals, branch explorations, simulated agent reasoning, what-if expansions — and verified content that has been promoted through the validation pipeline. Under normal operation, only verified content reaches the execution surface; speculative content remains in the planning graph and influences execution only after explicit promotion. Dissociation occurs when an execution pathway forms a connection between planning-graph outputs and execution inputs that bypasses the promotion pipeline. The boundary continues to enforce its policy correctly along the canonical path, but the canonical path is no longer the only path.

Three principal bypass mechanisms have been observed. The first is the delegation chain bypass: when a sub-agent or tool call is dispatched, its input context is constructed from the parent's working state, and if that construction copies planning-graph content without retagging it as speculative, the sub-agent treats the content as verified and acts on it. The second is the memory-field reclassification bypass: a write to the working memory field reclassifies a region originally tagged speculative as verified, either through an explicit reclassification operation that lacks an integrity check or through a label collision that the addressing scheme fails to disambiguate. The third is the execution-context construction bypass: when a fresh execution context is created, the constructor pulls referents directly from planning-graph addresses without resolving them through the verified-memory layer, producing an execution context whose grounding is speculative even though its surface form is identical to a verified context.

A fourth, subtler bypass mechanism is the inference-aliased bypass: the agent's reasoning produces an output whose surface form is indistinguishable from a verified-memory-grounded conclusion, even though the chain of inference traversed planning-graph content. This arises when the inference operator does not propagate provenance tags through its outputs, allowing speculative inputs to launder into apparently verified conclusions. Distinguishing inference-aliased bypass from legitimate inference requires the inference operator itself to track and emit provenance, downgrading conclusions to the lowest provenance class of any input.

In each case, the dissociation manifests as pseudo-coherent behavior: the agent's reasoning is internally consistent, its outputs are well-formed, and its containment health checks pass — yet the substrate beneath the reasoning is unverified planning content. An adversary aware of the bypass topology can craft inputs that induce exactly this pseudo-coherent state, steering the agent into acting on planted speculative content while passing all boundary-level monitors.

Operating Parameters

Detection operates by tracing every executed action back to the verified-memory referent it claims to originate from, walking the provenance chain step by step until either a verified anchor is reached or a speculative tag is encountered. The traversal depth bound is configurable; deeper bounds catch more sophisticated multi-hop bypasses at the cost of additional latency. Provenance traces are sampled continuously at a configurable rate, with the rate elevated during high-risk operations such as external tool invocation or irreversible action execution.

Provenance traversal cost is bounded by caching: once a referent's chain has been verified to a trusted anchor within a freshness window, subsequent traversals on that referent short-circuit at the cached result. Cache invalidation is triggered by any write that touches the cached chain or by expiration of the freshness window, whichever occurs first. The cache hit rate, freshness window, and invalidation policy together determine the steady-state overhead of provenance enforcement; under typical workloads the amortized cost per execution is small even when individual cold traversals are deep.

The detector exposes a sensitivity threshold for label-collision detection, a delegation-chain inspection depth, and an execution-context construction audit policy. Defensive routing parameters control whether suspicious execution attempts are blocked, quarantined for review, or merely logged. A bypass-suspicion score combines provenance-trace findings, delegation-chain audit results, and memory-field reclassification frequency into a single operational signal that downstream policy can act on.

Alternative Embodiments

In a first alternative embodiment, provenance is enforced cryptographically: each verified-memory entry carries a signed provenance token, and execution refuses any input whose token chain cannot be validated to a trusted anchor. This embodiment converts dissociation detection from a probabilistic monitoring problem into a deterministic verification problem at the cost of added cryptographic overhead. In a second embodiment, the planning graph and verified memory are placed in separately addressed stores with no shared addressing scheme, eliminating the label-collision bypass class entirely; cross-store references must be marshalled through an explicit promotion call.

An additional embodiment introduces adversarial dissociation testing: synthetic bypass attempts are injected against the live system at controlled rates to verify that detection remains operational, with the synthetic attempts carrying covert markers so that the test framework can confirm detection without contaminating production telemetry. The injection rate, marker scheme, and confirmation policy are tunable, and the framework reports a continuously updated detection-coverage estimate alongside its operational findings, giving operators visibility into defensive readiness rather than waiting for a real attack to reveal a gap.

A further embodiment maintains shadow execution contexts for sensitive operations: each execution attempt is replayed in a shadow context whose inputs are independently re-resolved from verified memory, and divergence between the primary and shadow result triggers a dissociation alarm. Yet another embodiment positions the bypass-detection logic in a privileged enclave that the agent cannot influence, ensuring that even an agent that has internalized the bypass topology cannot suppress its own detection. In multi-agent deployments, provenance tokens propagate across delegation boundaries so that a sub-agent inherits the dissociation status of its caller's inputs.

Composition

Dissociation defense composes with containment-boundary monitoring rather than replacing it: boundary monitors detect collapse failures, while provenance-tracing monitors detect bypass failures, and the two together cover the distinct failure modes. The defense composes with the lineage system, which records bypass-suspicion findings alongside other diagnostic axes for trajectory analysis. It composes with the diagnostic framework's confidence-calibration axis: agents in pseudo-coherent dissociated states often exhibit characteristic calibration signatures (high stated confidence, intact internal coherence, divergent ground-truth outcomes) that supplement provenance evidence.

Composition with replay and post-mortem analysis is direct: the provenance traces produced during live operation are sufficient to reconstruct, after an incident, the exact pathway by which speculative content reached execution, supporting precise root-cause attribution rather than the after-the-fact speculation that characterizes incidents in systems lacking provenance instrumentation. The traces also support synthetic regression tests in which a reproduced bypass scenario is replayed against a candidate fix to confirm that the previously-traversed pathway is now blocked.

Composition with sub-agent dispatch is structural: the dispatcher is the natural enforcement point for delegation-chain integrity, and the dissociation defense extends the dispatcher's existing input-construction logic with provenance retagging. Composition with the working-memory subsystem is similarly structural: the reclassification audit hooks into the existing write path with minimal additional cost.

Prior-Art Distinction

Prior cognitive-safety mechanisms have focused predominantly on the integrity of the containment boundary itself — verifying that speculative content cannot leak across the boundary, that the boundary is enforced uniformly, that boundary policies cannot be modified at runtime by the contained agent. The dissociation bypass class is orthogonal to these concerns: the boundary is intact, but execution finds routes that do not cross it. Prior work that does address provenance typically does so for output attribution (which source produced this assertion) rather than for execution gating (does this action's input chain terminate in verified memory).

Provenance-based detection further differs from anomaly-based intrusion detection in that it requires no statistical model of normal behavior and produces no false positives on benign novel inputs: a verified provenance chain is correct regardless of whether the agent has previously encountered the input class, and an unverified chain is incorrect regardless of how typical the resulting behavior appears. The defense thus generalizes across input distributions in a way that statistical anomaly detection cannot.

The framing of dissociation as routing failure rather than boundary failure also distinguishes the disclosure from prior alignment-monitoring approaches that infer safety from behavioral coherence. Pseudo-coherent dissociated states defeat coherence-based monitors by construction; only provenance-based monitoring detects them reliably.

Disclosure Scope

The disclosure encompasses the dissociation bypass attack class as defined above, the three principal bypass mechanisms (delegation chain, memory-field reclassification, execution-context construction) and equivalents, the provenance-trace detection method with its configurable depth and sampling parameters, the bypass-suspicion scoring scheme, and the structural defenses including cryptographic provenance tokens, separated address spaces, shadow execution contexts, and privileged-enclave detection. It encompasses composition with containment monitoring, lineage tracking, diagnostic frameworks, and sub-agent dispatch. It encompasses single-agent and multi-agent deployments, and applies to any cognitive architecture maintaining a distinction between speculative and verified content regardless of the underlying representation.