Multi-Model Arbitration With Shared Semantic State
by Nick Clark | Published March 27, 2026
Multi-model arbitration runs several inference models in parallel against a shared semantic state and combines their outputs under tier-weighted bounds, rather than electing a single model and discarding the rest. Disagreement between models is treated as a first-class structural signal: it is logged, bounded, and made available to downstream governance. This article specifies the arbitration mechanism, its operating parameters, alternative embodiments, composition with adjacent subsystems, prior-art distinction, and disclosure scope.
Mechanism
At each inference cycle the agent presents the current semantic state and the candidate transition request to a declared set of inference models. Each model is registered in policy with a tier label, a declared competence envelope describing the workload classes for which the model is admissible, and a declared output schema. The arbitration runtime invokes admissible models in parallel, collects their outputs in a uniform structured form, and forwards the collection to the arbiter together with the originating state.
The arbiter combines outputs under a tier-weighted bound. Each model contributes a weight derived from its declared tier, its current trust slope, and any active attenuation imposed by upstream governance. The combined output is constructed by aggregating model outputs under a declared aggregation function chosen from a fixed enumeration: tier-dominant selection, tier-weighted average, monotone envelope, and bounded consensus. The chosen function is recorded in lineage along with the input weights and the per-model outputs, so that the combined output is fully reproducible from the recorded inputs.
Disagreement is computed as a structured field rather than discarded. For scalar outputs the disagreement is the weighted variance across contributing models; for structured outputs it is the count and identity of fields on which models disagree under the declared output schema. The disagreement field is published alongside the combined output and consumed downstream by the differential alarm subsystem, by the workload classifier, and by the lineage writer. A disagreement that exceeds a declared bound triggers a fallback path: either decomposition of the request into smaller transitions, escalation to a higher-tier model, or rejection with a recorded reason.
Critically, all participating models read from and write to a shared semantic state object rather than to private working memories. This shared object is the canonical record of what the agent currently believes, and any mutation a model proposes must be expressed as a structured edit against the shared object. The arbiter therefore arbitrates not over opaque model outputs but over typed, schema-bound proposed mutations, which is what makes deterministic combination and disagreement measurement possible at all.
Operating Parameters
Each registered model carries a tier weight in a declared range, a competence envelope expressed as a set of admissible workload classes, and an attenuation field that may be modified at runtime by the confidence governance subsystem. Tier weights determine the model's nominal contribution; competence envelopes determine whether the model is invoked at all for a given request; attenuation reduces a model's effective weight without removing it from the registered set, preserving its lineage contribution while limiting its influence.
The aggregation function is a declared parameter of the arbiter. Tier-dominant selection chooses the highest-tier admissible output and records the lower-tier outputs as observers. Tier-weighted average combines outputs proportionally to weight under a declared metric. Monotone envelope returns the output that is both admissible and most conservative under a declared ordering, which is the typical choice for safety-critical workloads. Bounded consensus returns a combined output only when weighted disagreement is below a declared bound, falling through to a recorded fallback otherwise.
The disagreement bound itself is a parameter and may be expressed as an absolute scalar, as a fraction of the highest-weight model's declared standard error, or as a structured predicate over the output schema. A debounce parameter requires the bound to be exceeded for a declared number of consecutive cycles before the fallback is taken, preventing single-cycle spikes from derailing otherwise stable arbitration.
Alternative Embodiments
In a homogeneous-tier embodiment, all registered models share the same tier and the arbiter operates as a pure consensus function. In a hierarchical embodiment, tiers are strictly ordered and lower tiers are admissible only when the higher tier abstains or attenuates below a configured threshold. In a specialist embodiment, models register narrow competence envelopes covering disjoint workload classes, and the arbiter typically receives outputs from a single model per cycle but retains the ability to invoke neighboring specialists when disagreement signals overlap.
The shared semantic state can be embodied as a single in-memory object, as a transactional store with snapshot isolation across model invocations, or as a distributed log with declared consistency guarantees. The arbiter is unchanged across these embodiments because it consumes only the structured proposed mutations and the registered weights; the durability and concurrency characteristics of the underlying store are matters of deployment rather than of mechanism.
Disagreement handling admits domain-specific extensions. In a clinical embodiment, persistent disagreement triggers an explicit human review path with the contested mutations preserved verbatim. In a vehicular embodiment, persistent disagreement collapses the response to the most conservative monotone envelope and triggers a controlled de-escalation. In a trading embodiment, persistent disagreement halts new positions while permitting maintenance of existing positions under the prior consensus.
Composition
The arbiter publishes its combined output, its per-model contributions, and its disagreement field through canonical fields consumed by downstream subsystems. The differential alarm subsystem treats sustained high disagreement as a divergence between observed and expected confidence at the ensemble level. The transition governor consumes the combined output as a candidate semantic mutation and admits, rejects, or decomposes it under its own admissibility rules. The lineage writer records the full arbitration record so that any past combined output can be reconstructed from the recorded inputs.
Because models read from and write to the shared semantic state through structured proposals rather than direct mutation, the arbiter composes cleanly with the transition mutation subsystem: each accepted combined output is itself a single mutation descriptor that the transition governor evaluates against the current state, regardless of how many underlying models contributed. This preserves a single deterministic admit, reject, or decompose decision per cycle even when the underlying inference is ensemble in character.
Prior-Art Distinction
Conventional ensemble systems combine model outputs through static voting or fixed weighted averages and treat the ensemble as an opaque source of a single output. Such systems do not expose disagreement as a governable field and do not bind ensemble outputs to a shared, structured semantic state. Conventional model-cascade systems route requests to a single model per cycle and discard the others, losing the disagreement signal entirely. Multi-model arbitration is distinguished from both classes by its explicit shared semantic state, its tier-weighted aggregation under a declared function, and its first-class treatment of disagreement as a governance input rather than as ensemble noise.
Implementation Considerations
Parallel invocation requires that models observe a consistent semantic state at the moment of invocation. Implementations therefore snapshot the shared state at cycle entry and present the snapshot to all admissible models, rather than allowing each model to read live state. This guarantees that disagreement reflects model behavior rather than read-skew across the invocation window. Proposed mutations are collected against the snapshot identifier, so the arbiter knows which mutations are commutable and which must be sequenced.
Trust slope is itself a recorded field per model and evolves as a function of the model's contribution history, the disagreement attributable to it, and the downstream acceptance of mutations to which it contributed. The disclosure contemplates a slope update rule that is monotone in agreement with accepted outcomes and bounded under a declared decay so that no single cycle can dominate a model's standing. The slope is the slow channel of governance; the per-cycle weight modulated by slope is the fast channel; the two together produce a contribution that is both stable across cycles and responsive to sustained behavior change.
Failure handling is a structural concern. A model that fails to return within a declared deadline is recorded as abstaining for that cycle, and the arbiter combines the remaining contributions under the same aggregation function with renormalized weights. Repeated abstention crosses a declared threshold that places the model into an attenuated state pending review, which is itself a structured event recorded in lineage. The arbiter therefore degrades smoothly under partial model failure rather than blocking on the slowest contributor.
Model registration and de-registration are themselves structural events. A model entering the registered set must declare its tier, competence envelope, output schema, and initial trust slope under a registration record committed to lineage; a model leaving the registered set must do so through a fade phase rather than through immediate removal, so that its final cycles of contribution are preserved in lineage and its absence does not produce a discontinuous jump in the arbiter's combined output. The disclosure contemplates a registration governor that admits new registrations only when their declarations are consistent with the shared semantic state schema and that rejects de-registrations of models currently dominant in active workload classes until alternative coverage is demonstrated. This binds the population of models to the same audited discipline that governs the population's outputs.
The disagreement field schema is held stable across implementations and is consumed unchanged by downstream subsystems. A change to the schema is itself a structural mutation subject to audit. The shared semantic state schema is similarly held stable and is the canonical interface between the inference-control subsystem and the agent's broader cognitive architecture, which is what permits arbitration to be re-implemented without disturbing the governance and lineage subsystems that consume its output.
Disclosure Scope
The disclosure covers the registration of models with tier weights and competence envelopes, the parallel invocation against a shared semantic state, the declared aggregation functions, the disagreement field and its bounds, the fallback paths, the trust slope and its update rule, the failure handling discipline, the alternative embodiments enumerated above, and any system that reproduces the structural relationship between shared state, tier-weighted contribution, and logged disagreement. Implementations that reproduce this structural relationship fall within scope regardless of the specific model families employed or the specific aggregation function selected. The disclosure further covers any system that materializes per-model contribution and ensemble disagreement as first-class fields consumed by adjacent governance subsystems, regardless of whether arbitration is implemented as a discrete component or as a function distributed across the cognitive architecture.