Mechanism
Every large language model integrated into the architecture occupies the structural role of an untrusted proposal generator. The term structurally untrusted is used in contradistinction to the conventional assumption in which a model's output is treated as authoritative: a response, an answer, or a decision the consuming system may adopt or act upon without independent validation. Here no output produced by any language model is authoritative. Every output is a proposal: a candidate semantic mutation that must be independently evaluated, validated, and either accepted, modified, or rejected by the agent-resident infrastructure before it can affect any agent field, any execution state, or any downstream behavior. In tool-augmented agent frameworks the model occupies the role of decision-maker. In this disclosure the model occupies the role of proposal-maker, and the agent itself, through its resident validation engine, occupies the role of decision-maker.
The untrust is architectural rather than a runtime convention. The execution pathways are constructed such that no language model output can reach any agent field, governance decision, certification token, capability gate, or external-facing behavior without first passing through the validation engine and, where multiple models produce competing proposals, through the arbitration engine. There is no bypass path, no trusted-model exception, and no escalation mechanism by which a model can promote its own output to authoritative status. The model is confined to a bounded proposal zone on the proposal side of the proposal-validation boundary, and the confinement is enforced by the execution substrate itself, not by checks that could be misconfigured, disabled, or circumvented.
The untrust is motivated by an epistemic asymmetry between the model and the agent. The agent possesses verified state: its fields (intent, context, memory, policy reference, mutation descriptor, lineage, and affective state) are populated through governed mutation events that are cryptographically signed, policy-validated, and lineage-recorded, so each value carries a provenance chain. The model possesses no verified state. It does not maintain persistent state across inference calls, does not track the provenance of its own outputs, and cannot distinguish outputs well-grounded in verified information from outputs that are hallucinated, confabulated, or statistically plausible but incorrect.
The Unidirectional Interface
When the agent requires a mutation to one of its fields, for example updating its context block in response to new environmental information or generating a candidate execution plan, it may invoke one or more language models to produce candidate mutations. Each model receives a bounded prompt context derived from the agent's current state and produces one or more candidate mutations. These candidates flow through a unidirectional interface into the validation engine. No return path exists by which the validation engine's internal state, the agent's field values, or governance decisions are exposed back to the model. This prevents the model from learning to craft proposals that exploit knowledge of the validation logic.
The architecture extends to multiple models operating in parallel or in sequence. When the system invokes multiple models for the same operation, each model's output is independently submitted to the validation engine. The system does not aggregate outputs through voting, averaging, or ensemble techniques that would produce a blended output inheriting the authority of multiple models. Each output is treated as an independent proposal that must independently satisfy the validation criteria, and where multiple proposals survive, the arbitration engine resolves the selection through trust-weighted evaluation that is itself a governed, auditable semantic event.
The Mutation Engine
The mutation engine is interposed between the language model output boundary and the validation engine input boundary, and its function is to impose structural discipline on the inherently unstructured output of the model. It performs four operations on each raw proposal. First, schema mapping: the raw output, which may be natural language text, structured JSON, or an intermediate representation, is mapped onto the agent's field schema to identify which fields the proposal seeks to modify, and a proposal addressing multiple fields is decomposed into a set of per-field candidate mutations. Second, bounds normalization: proposed values are normalized to each field's defined value range, data type, and representational format, and a value outside representational bounds is flagged as malformed and rejected prior to validation. Third, conflict detection: when proposals from multiple models target the same field, the competing proposals are packaged as a conflict set for the arbitration engine. Fourth, lineage annotation: each candidate is annotated with the originating model identity, the prompt context supplied, a timestamp, and a hash of the raw proposal.
The mutation engine does not evaluate semantic correctness, factual accuracy, or policy compliance. Its role is structural: it ensures proposals are well-formed, schema-compliant, and annotated for governance, but it does not assess whether proposed values are good or desirable. That assessment is the exclusive responsibility of the validation engine. This separation ensures the mutation engine cannot inadvertently validate a proposal by passing it through, and that the validation engine receives candidates in a uniform format regardless of which model produced them.
The Validation Engine
The validation engine evaluates each candidate mutation against the full set of agent-resident constraints to determine whether the mutation may be incorporated into agent state. It is resident within the agent's execution environment and operates on the agent's verified state. It does not consult the model, does not consult external oracles, and does not defer to any authority other than the agent's own policy, lineage, and structural constraints. It is the enforcement boundary that gives operational meaning to the structural untrust of the model.
Each candidate is evaluated against a plurality of constraint categories. Policy compliance evaluates whether the proposed value falls within the policy-permitted range for that field. Lineage consistency evaluates whether the proposed value is consistent with the agent's mutation history; a mutation that would reverse a previously governed decision without a corresponding governance event, or that would contradict a cryptographically sealed prior state, fails. Integrity compliance evaluates whether the proposed mutation would drive the agent's integrity below the threshold at which coherence is maintained. Capability feasibility evaluates whether the proposed action can be structurally executed on the available substrate. Affective bounds evaluates whether the mutation would drive the agent's affective state outside its policy-bounded range.
The validation engine produces a structured validation record for each candidate, indicating whether it passed or failed, which categories were evaluated, which constraints were satisfied or violated, and the specific violation details. The record is persisted in the agent's lineage regardless of outcome, so that rejected mutations and the reasons for their rejection remain available for governance audit, for learning-through-rejection analysis, and for dispute resolution. The engine operates synchronously: a candidate receives a determination before any subsequent mutation for the same field is evaluated. It locks the target field during evaluation and either applies the mutation and releases the lock or discards it and releases the lock, an atomicity enforced by the execution substrate and not dependent on the model or the mutation engine.
Trust-Weighted Arbitration
The arbitration engine resolves conflicts when multiple models produce competing candidates for the same field or operation. It receives conflict sets from the mutation engine and selects a single winning candidate or synthesizes a reconciled candidate. Each model's proposal is weighted according to the model's accumulated trust score within the agent's governance context. The trust score is dynamic: accepted proposals increase it, rejected proposals decrease it, and proposals accepted but later found to have introduced errors or policy violations produce a retroactive penalty. The score is maintained per agent and per domain, so a model may be recognized as reliable for one category of proposal and unreliable for another.
Each candidate in the conflict set is scored on a plurality of evaluation dimensions, including semantic coherence with current state, consistency with the agent's intent field, alignment with the policy reference, and compatibility with the lineage trajectory. The per-dimension scores are multiplied by the originating model's trust weight to produce a trust-adjusted composite score, and the highest-scoring candidate is selected. If no candidate achieves a composite above a configurable minimum, the engine may reject all candidates and request new proposals or escalate to a governance authority. Where competing proposals are partially compatible, the engine may synthesize a reconciled candidate from the highest-scoring elements; the reconciled candidate is then submitted to the validation engine as though it were new, so reconciliation does not bypass validation, and its lineage annotation records all contributing models and the reconciliation logic.
Every arbitration decision is recorded as a first-class semantic event in the agent's lineage, including the competing model identities, the candidates produced, the trust weights applied, the per-dimension scores, the selection or reconciliation logic, and the winning candidate. The event is cryptographically signed and sealed, so the trust-weight feedback loop operates on a tamper-resistant record. This makes arbitration decisions part of the agent's persistent memory, auditable governance artifacts, and inputs to cross-agent governance: a pattern in which a particular model consistently produces proposals that fail or are overridden can be propagated to other agents using the same model for network-wide trust recalibration.
Hallucination Prevention via Structural Starvation
The system prevents hallucination through a mechanism designated structural starvation: the model is denied access to the informational resources that would be necessary for hallucination to occur, rather than detecting and filtering hallucinated content after it has been produced. Post-hoc filtering is inherently unreliable because the same statistical patterns that produce hallucinated content also produce plausible-appearing hallucinated content that evades detection. Structural starvation instead constrains the informational environment so that hallucinated content is not generated in the first place. It is composable with any model-level alignment technique and does not depend on the model being well-aligned; it produces safe behavior through architectural containment regardless of the model's alignment status.
Structural starvation is implemented through five complementary constraints. Prompt bounding restricts the model to a bounded, curated prompt context derived from the agent's verified fields, not an open-ended window populated by retrieval augmentation, user history, or web scraping; the model cannot hallucinate about information it has never been given. Absence of external memory denies the model any persistent memory, knowledge base, retrieval store, or external data source beyond the current prompt; if the information required for a correct proposal is not present, the absence of the proposal is the structurally correct outcome. Forced reliance on agent fields requires proposals to reference verified field values, and a proposal referencing entities or facts not present in the verified fields is flagged during schema mapping as ungrounded and rejected. Intermediate rejection immediately discards any mutation that fails validation, without returning the validation record or guidance to the model. Stateless purging destroys the model's context after each inference call, so no residual state persists and the model cannot engage in multi-turn adversarial optimization that incrementally probes the validation boundary.
Validation Feedback Asymmetry as Adversarial Defense
The deliberate informational asymmetry between the model as proposer and the validation engine as evaluator is itself a structurally enforced adversarial defense. The model does not receive the violated constraint, the validation record, or guidance on how to craft a passing proposal. Even a model fine-tuned on adversarial examples, subjected to red-teaming, or equipped with chain-of-thought reasoning cannot learn to circumvent validation criteria it cannot observe: it cannot optimize against a constraint it does not know exists, and it cannot probe the boundary because it receives no signal about where the boundary lies. The asymmetry is maintained across inference calls by stateless purging, so that even partial information inferable from a single rejection pattern is destroyed at the boundary of each call. This creates an architectural guarantee of non-circumvention that is stronger than filtering or alignment operating on the model's own parameters.
Multi-turn interaction is supported without violating this statelessness. Continuity is maintained not by the model but by the agent: each exchange produces governed mutations to the agent's fields, and when the next exchange requires the model, the agent constructs a bounded prompt from its current verified state, which includes the accumulated context from prior exchanges. The model receives this prompt, produces its proposal, and is purged. The interaction history is preserved in governed state, subject to the same validation and lineage constraints as all other agent state, rather than in the model's context window.
Distinction from Prior Art
Conventional language-model integration falls into two categories. Direct integration consumes model output as authoritative and applies any safety check as a post-hoc filter on the resulting behavior, treating the model as trusted by default and relying on the filter's completeness. Sandboxed integration executes model output in an isolated environment and observes its effects before merging them into agent state, treating the model as untrusted but conflating execution with proposal, since the runtime must actually run the proposed action to evaluate it.
The mechanism described here treats the model as untrusted by default but does not require execution to evaluate a proposal. Validation is performed against declared policy and verified agent state, not against observed behavior. This is material because it allows the agent to reject a proposal whose execution would be irreversible, costly, or hazardous without ever performing the action, and because the containment is a structural property of the execution substrate rather than a filter that can be removed or bypassed.
Disclosure Scope
The structural role of the language model as an untrusted proposal generator, the unidirectional interface from the model to the validation engine, the mutation engine performing schema mapping, bounds normalization, conflict detection, and lineage annotation, the validation engine evaluating candidate mutations against policy compliance, lineage consistency, integrity compliance, capability feasibility, and affective bounds, the trust-weighted arbitration engine and the recording of arbitration as a first-class sealed semantic event, the five constraints of structural starvation (prompt bounding, absence of external memory, forced reliance on agent fields, intermediate rejection, and stateless purging), and the validation feedback asymmetry as adversarial defense, are disclosed in the cognition filing (U.S. Application No. 19/647,395 and its international counterpart). This article describes that disclosed mechanism. The disclosure covers any system in which a language model produces structured candidate mutations that must traverse a policy-governed validation stage before affecting agent state, regardless of runtime environment, model architecture, or deployment topology. Implementations that route model output directly into agent state without a structurally enforced validation boundary, or that validate only against observed execution rather than against declared policy and verified state, fall outside the disclosure.