Trust Weight Calibration and Decay

Nick Clark

Trust Weight Calibration and Decay

by Nick Clark | Published March 27, 2026 | PDF

Trust weights attached to skill outputs are calibrated against a rolling history of observed-versus-predicted deviation; each new deviation tightens the weight applied to that skill's future outputs, and the calibration record is itself a mandatory artefact subject to audit, such that a skill whose calibration record is missing or stale is non-admissible to the gated execution path.

Mechanism

Trust weight calibration is a feedback mechanism that binds the credibility of a skill's output to its empirical track record. As defined in the cognition patent's skill-gating chapter, every output produced by a gated skill carries a trust weight, a scalar drawn from a bounded interval that downstream consumers, validators, and arbitrators use to modulate how strongly the output influences subsequent state. The trust weight is not assigned a priori by the skill's author; it is the running output of a calibration function that the gating runtime maintains for each skill identity, fed by the rolling history of that skill's prior emissions and their post-hoc validations.

The calibration function operates on tuples consisting of a predicted-confidence value emitted by the skill at the moment of output, an observed-correctness value derived after the output's effects are evaluated against a ground-truth signal, and a temporal stamp anchoring the tuple in the agent's lineage. Predicted confidence may originate from the underlying model's likelihood estimate, from an explicit self-report channel declared in the skill's manifest, or from a structural property of the skill's output schema. Observed correctness is computed by a validator declared in the skill manifest, which may be a rule-based check, a downstream-effect classifier, a human-in-the-loop adjudication, or a comparison against a deferred ground-truth oracle.

For each new tuple, the calibration function computes a deviation, the signed difference between predicted and observed values, and folds the deviation into a sufficient-statistic vector summarizing the skill's calibration profile. The vector is a fixed-dimension structure encoding the skill's mean over-confidence, mean under-confidence, variance of deviations, and decay-weighted recency of error, among other moments specified by the active calibration policy. The trust weight assigned to the next output is computed deterministically from this vector through a published mapping function that the policy reference governs.

The tightening property is fundamental: a deviation in which observed correctness fell below predicted confidence reduces the multiplier applied to subsequent predictions, narrowing the band in which downstream consumers will accept the skill's claims at face value. The tightening is non-symmetric; deviations in which the skill underclaimed and was nevertheless correct loosen the weight only gradually, while deviations in which the skill overclaimed tighten the weight sharply. This asymmetry is intentional and is what causes the system to converge toward conservative calibration over time.

Decay is the second invariant. The sufficient-statistic vector is maintained as a rolling window, with older tuples weighted geometrically less than recent ones according to a half-life parameter declared per skill. Decay ensures that a skill which corrects its behavior, whether through a model upgrade, a manifest revision, or a change in operating conditions, can recover trust without requiring the discarding of its history. The decay constant is not freely chosen at runtime; it is part of the calibration record and is itself audited.

The audit-required property closes the structural loop. The calibration record, comprising the sufficient-statistic vector, the half-life parameter, the mapping function reference, and a hash chain over the contributing tuples, is a mandatory artefact that the gating runtime persists alongside the skill's manifest. Each invocation of the skill includes, in its lineage emission, a reference to the active calibration record version. A skill whose calibration record is missing, whose hash chain is broken, or whose record version is older than the policy-declared maximum staleness window is downgraded to non-admissible: the gating runtime will not route its outputs into canonical state until the record is regenerated under a current validator pass.

Operating Parameters

The calibration regime is parameterized at three levels. At the skill level, the manifest declares the validator identity, the predicted-confidence channel, the half-life constant, and the mapping function reference. At the policy level, the active policy reference declares the calibration epoch, the maximum staleness window, the deviation-asymmetry constants, and the audit emission cadence. At the deployment level, the gating runtime exposes a tier selector that composes skill and policy parameters into a concrete operating profile.

The maximum staleness window is the parameter most directly tied to the audit-required property. It bounds the elapsed time, or alternatively the number of intervening invocations, between the most recent validator pass and the current invocation. A short window enforces frequent re-validation, suiting domains in which the skill's environment drifts rapidly. A long window suits domains in which validation is expensive and the skill's behavior is stable. The window may also be expressed as a function of the skill's recent deviation magnitude, contracting automatically when deviations grow.

The mapping function reference is bound to a versioned registry, allowing the gating runtime to migrate skills from one mapping to another in a controlled fashion. A migration emits a transition record in lineage, captures the skill's sufficient-statistic vector at the boundary, and re-derives the trust weight under the new mapping. This permits domain-specific recalibration policies to evolve without invalidating prior history.

Alternative Embodiments

In a first alternative embodiment, the validator producing observed-correctness values is a federation of independent validators whose outputs are combined by a quorum function before entering the calibration tuple. This embodiment increases robustness against validator compromise and allows independent auditors to participate.

In a second alternative embodiment, the trust weight is a vector rather than a scalar, with components corresponding to distinct dimensions of correctness, for example factual accuracy, format compliance, and temporal freshness. Downstream consumers select the relevant component for their use case, and tightening occurs per component.

In a third alternative embodiment, the calibration record is committed to an external append-only log, separate from the agent's lineage, with the gating runtime verifying log inclusion before admitting outputs. This embodiment supports cross-agent and cross-organization trust portability.

In a fourth alternative embodiment, decay is replaced by a change-point detector that segments the calibration history into stationary regimes and computes the trust weight only over the most recent regime. This embodiment suits skills whose behavior changes abruptly rather than gradually.

In a fifth alternative embodiment, predicted-confidence values are derived from a wrapper model rather than from the gated skill itself, with the wrapper trained to estimate the skill's correctness from features of the skill's input and output. This embodiment supports skills that lack an internal confidence channel and preserves the calibration mechanism's structural properties without requiring modification of the underlying skill implementation.

In a sixth alternative embodiment, the calibration record is sharded by input-feature class, with separate sufficient-statistic vectors maintained for each class as identified by a feature classifier declared in the manifest. The trust weight applied to a given output is drawn from the shard whose feature class matches the input, allowing skills with heterogeneous reliability across input regimes to be calibrated faithfully without a single global weight that would misrepresent every regime.

Composition with Other Mechanisms

Trust-weight calibration composes with the skill security layer: each sanitization rejection or output filtering event is a calibration tuple in its own right, contributing to the rolling history and tightening the weight when security events recur. Skills that frequently trigger their security layers see their outputs marginalized in downstream consumption, even when each individual output passed.

Calibration composes with arbitration among parallel skills: when multiple skills produce candidate outputs for the same downstream slot, their trust weights determine the arbitrator's selection or weighted combination. A well-calibrated rare correct skill may outweigh a poorly calibrated frequently-correct skill when the arbitrator weighs deviation history.

Calibration composes with capability-awareness by allowing capability-tier transitions to invalidate calibration history selectively. A skill operating under a contracted capability envelope draws on a separate sufficient-statistic vector from its expanded-envelope history, preventing inappropriate carry-over of trust earned under different operating conditions.

Calibration composes with planner backtracking: when the planner's chosen chain produces a poor outcome, the post-hoc validator's deviation signals are routed not only to the immediately responsible skill but also, with attenuated weight, to upstream skills whose outputs constrained the chosen skill's behavior. The attenuated propagation lets calibration capture diffuse responsibility without conflating it with direct responsibility, and the attenuation factor is itself a policy parameter subject to audit.

Prior-Art Context

Conventional LLM evaluation treats calibration as an offline characterization, computed over a benchmark and reported as a static property of a model release. The disclosed mechanism contributes a continuous, per-skill, audit-bound calibration that updates with every invocation and conditions admissibility on the calibration record's currency.

Existing reliability-engineering practices employ rolling error budgets, but these are typically aggregated at the service level and do not bind individual outputs to per-skill trust weights nor condition execution on calibration freshness. The disclosed mechanism adapts the rolling-budget intuition to per-skill granularity and structurally couples it to the gating runtime's admissibility predicate.

Confidence-calibration research in machine learning, including temperature scaling and isotonic regression, addresses the mapping from raw model scores to calibrated probabilities. The disclosed mechanism subsumes such mappings as instantiations of the mapping function reference but adds the structural layer of audit-required calibration records, deviation-asymmetric tightening, and admissibility gating that the prior research does not provide.

Disclosure Scope

The disclosure encompasses any embodiment in which trust weights attached to gated-skill outputs are calibrated continuously against a rolling history of observed-versus-predicted deviation, with deviation tightening future weights, with decay or change-point segmentation governing the history's relevance, and with the calibration record itself constituted as a mandatory audit-bound artefact whose currency conditions the skill's admissibility into the gated execution path.

The scope extends to vector-valued trust weights, federated validators, externalized calibration logs, and policy-versioned mapping functions, as well as compositions with security-layer events, parallel-skill arbitration, and capability-tier-aware history segmentation.

The disclosure does not depend on the underlying validator implementation, mapping function form, or persistence substrate; it is the structural arrangement coupling per-skill calibration to admissibility that is claimed, with arbitrary substitution of the underlying components permitted so long as the audit invariants are preserved. Implementations that compute trust as a derived quantity from any audit-bound rolling deviation history, however expressed, fall within the disclosed scope; the precise statistical machinery is illustrative rather than limiting, and the structural contributions of admissibility-conditioning, deviation-asymmetric tightening, and audit-emitted calibration records are what the disclosure intends to capture.