Validation Feedback Asymmetry
by Nick Clark | Published March 27, 2026
Reinforcement and revocation are not symmetric operations on a capability surface. Positive feedback — the accumulation of credit that broadens what a model is permitted to do — is rate-limited, batched, and visible only after a dwell period. Negative feedback — revocation, downgrade, scope contraction — acts immediately and unilaterally. The asymmetry is the structural property that prevents reward-hacking and adversarial conditioning of the gating layer itself.
Mechanism
Validation feedback asymmetry is a property of the function that maps individual validation outcomes — admits, rejects, partial admits, escalations — into changes in a model's standing within the gating layer. The function has two distinct paths. The positive path consumes admit and partial-admit outcomes and contributes to a slow-moving credit aggregate that, after passing thresholds and dwell windows, is allowed to widen the model's authorized surface. The negative path consumes reject, escalation, and revocation outcomes and applies them directly to the surface, contracting it within the same evaluation cycle in which the outcome was produced. The two paths share inputs but differ in latency, batching, and authority requirements.
On the positive side, no single admit changes anything observable. Admits accrue into a windowed counter keyed by skill, scope, and operator class. The window is long relative to the time an adversary would be willing to spend producing crafted inputs that pass validation; the threshold is high relative to incidental admit volume. Crossing the threshold does not by itself widen the surface — it produces a candidate widening proposal that must be admitted by a separate authority, typically the same governance pathway that produced the original gate. The slowness is intentional: a model cannot accelerate its own promotion by producing more admits, because the rate of consideration is fixed by the window, not by submission volume.
On the negative side, a single sufficiently severe outcome is consequential. A validator rejection of a class-A constraint, an operator-initiated revocation, a governance dissent against an outstanding admission, or a credential revocation of a signing authority each contracts the surface immediately. The contraction does not require batching; it does not require a separate authority to confirm; it does not wait for a dwell window. The negative path is deliberately constructed so that the cost of being wrong — admitting harmful behavior — is bounded by the latency of the negative path, not by the cadence of the positive path. Asymmetry in the temporal dimension is therefore the primary asymmetry; asymmetry in authority and in batching follow from it.
The function is implemented as a pair of monotonic operators over the capability surface: a slow widening operator gated by external admission, and a fast narrowing operator triggered by qualifying negative outcomes. The widening operator cannot be invoked by the model or by any party without the appropriate governance credential; the narrowing operator can be invoked by any of several roles, including the operator on whose behalf the model is acting, the validator pipeline itself, and the governance authority. The set of parties who can narrow is a strict superset of the set who can widen.
Operating Parameters
Positive-path parameters include the credit window duration, the credit threshold, the per-outcome credit weight (admits and partial admits may contribute differently), the dwell period after threshold crossing before a widening proposal is emitted, and the authority class required to confirm the proposal. Window durations are typically expressed in operationally meaningful units — hours, days, or completed task counts — rather than in raw validation events, so that an adversary cannot compress the window by flooding the channel with trivial admits.
Negative-path parameters include the per-outcome severity classification, the immediate contraction action associated with each severity (downgrade to read-only, narrow scope to a smaller operator class, withdraw the skill entirely, suspend the model across all skills), and the cool-down period during which the contraction cannot be reversed by positive-path activity. Severity classification is itself parameterized: a class-A constraint violation may always trigger immediate withdrawal; a class-B may trigger scope narrowing; a class-C may merely emit a structured note that contributes to negative aggregate.
Cross-path parameters control how the two interact. A reset parameter specifies whether a contraction zeroes the positive credit aggregate, partially decays it, or leaves it intact. A quarantine parameter specifies whether a contracted surface is eligible for re-widening at all and, if so, after what minimum interval and with what additional evidence requirements. A visibility parameter controls whether the model can observe its own credit aggregate; in adversarial-resistant configurations the aggregate is opaque to the model, so the model cannot regress to behaviors that maximize observable credit.
Alternative Embodiments
The credit window can be embodied as a fixed sliding window, as an exponentially decaying aggregate, or as a discrete epoch in which credit resets at known boundaries. Sliding windows are appropriate where smooth promotion behavior is desired; epoch boundaries are appropriate where promotion decisions must align with a governance cadence such as a weekly review. The choice is a parameter, not a structural variant; the asymmetry property holds in each case because the negative path remains immediate.
Severity classification can be embodied as a static table maintained by governance, as a learned classifier whose outputs are themselves credentialed, or as a hybrid in which a learned classifier proposes severity and a static table caps it from above. The asymmetry property requires that severity assignment cannot be lowered by the model under evaluation; it does not require that severity be static. Contraction actions can be embodied as discrete state transitions, as continuous scope reductions parameterized by a real-valued severity score, or as composite actions that combine narrowing on one axis with withdrawal on another.
The widening confirmation authority can be embodied as a single human reviewer, as a quorum among reviewers, as an automated harness that runs a curated re-evaluation suite, or as an external attestor whose signature is consumed as an evidence record. In federated deployments, the authority can be the tenant's own governance role rather than a central authority, with each tenant maintaining its own widening cadence while sharing the immediate-narrowing pathway across the federation.
Composition With Adjacent Mechanisms
Feedback asymmetry composes directly with evidence-based capability gating. The positive path's credit aggregate is a structured input into the gating descriptor's evidence depth requirement; the negative path's contractions are themselves evidence records that flow back into the store and can demote a previously admitted skill on the next gate evaluation. The two mechanisms are not redundant: gating decides whether a skill is exposed at all; asymmetry governs how that exposure narrows or broadens during the period that the gate is in force.
Asymmetry composes with arbitration when multiple models compete to provide a capability. A model whose contractions accumulate faster than its credit accumulates is effectively deprioritized by arbitration without any explicit ranking step, because its authorized surface shrinks faster than peers'. Asymmetry composes with operator-intent mechanisms when contractions are scoped by operator class: a model may retain full surface for high-confidence first-party operators while contracting sharply for lower-confidence inferred operators following a single class-A violation in that population.
Asymmetry also composes with the audit layer. Every widening proposal, every confirmation, every contraction, and every cool-down expiry produces a signed event in the same audit stream that records gate admissions and validator outcomes. The result is that the trajectory of a model's authorized surface over time is itself a reviewable artifact: a reviewer can ask why a skill was withdrawn, observe the qualifying negative outcomes, and observe the absence or presence of a re-widening proposal.
Distinction From Prior Art
Reinforcement-style training of language models implements a form of feedback, but the feedback is symmetric in time and authority: positive and negative signals enter the same gradient update and act on the same weights at the same cadence. There is no structural separation between "this output was admitted" and "the model's authorized surface should widen as a consequence." A model that learns to produce outputs which pass validation is, in the conventional pattern, simultaneously learning to be granted broader access; the two are inseparable. This makes the validation criteria themselves a target: a model that can reverse-engineer the criteria can ascend the access ladder by producing outputs designed to pass them rather than to be correct under them.
Conventional access-control systems with audit logs do separate authorization from runtime, but they do not implement the temporal asymmetry. A privilege grant and a privilege revocation in standard systems run through the same change-management pipeline at the same cadence; revocations may even be slower than grants because change windows queue them together. The disclosed mechanism inverts this: revocation is structurally faster than grant, the set of parties who can revoke is broader than the set who can grant, and the model under evaluation cannot accelerate either side from its own outputs. The closest analogues are found in clinical privileging and in aviation, where an instructor can suspend a rating immediately while restoring it requires a formal review — but those systems are not applied to language-model capability surfaces, and they do not bind the asymmetry to a structured evidence store and a deterministic gate.
Disclosure Scope
A practical consequence of the asymmetry is that adversarial conditioning of the gating layer becomes structurally infeasible rather than merely difficult. An adversary attempting to widen a surface must produce sustained admit volume across a long window, must avoid producing any class-A negative outcome during that window, and must obtain confirmation from an authority whose credential the adversary does not control. Each of these is achievable in isolation; their conjunction across the dwell window is the barrier. The opacity of the credit aggregate to the model under evaluation removes the gradient an adversary would otherwise climb. The breadth of the narrowing party-set means a single careful observer anywhere in the system can interrupt the adversary's accumulation. These properties are emergent from the asymmetry rather than added on top of it, which is the structural argument for treating asymmetry as the disclosed primitive.
The disclosed mechanism encompasses the paired-operator structure of the feedback function, the temporal asymmetry between widening and narrowing, the authority asymmetry that makes the narrowing party-set a superset of the widening party-set, the parameter space governing window, threshold, dwell, severity, contraction action, cool-down, reset, quarantine, and visibility, and the audit-stream binding that makes every transition reviewable. The mechanism is independent of the specific underlying model, of any specific validator implementation, and of any specific governance organization; it is the asymmetry itself that is disclosed, embodied across the alternatives described above. Equivalent embodiments that preserve the immediate-narrowing, slow-and-confirmed-widening, opaque-aggregate properties — and that bind those properties to a credentialed audit stream — are within scope.