Mechanism
Confidence history calibration is the disclosed mechanism by which an agent's own behavioral record is treated as a structured calibration signal that may be used to refine the confidence evaluation function over time. The signal is the agent's confidence trajectory: the temporal sequence of confidence values, the structured observations that triggered each confidence change, and the outcomes of the execution and suspension decisions that followed. Because every mutation to the confidence field is already recorded in the agent's lineage, the trajectory exists as an auditable temporal record. Calibration reuses that record. It does not introduce a new score, factor, or runtime adjustment applied to live confidence values; it refines the function that computes confidence in the first place.
The confidence evaluation function is, as disclosed, a deterministic function that maps a structured input vector of agent state inputs and task state inputs to a confidence value and a confidence rate of change. Calibration operates on that function's weighting parameters, its threshold sensitivity, and its input feature selection. The behavioral history supplies the labeled examples from which those parameters are refined, so the agent's assessed sufficiency comes to track its demonstrated sufficiency without an external annotation step.
Calibration Examples Drawn From Outcomes
The calibration examples are derived by pairing recorded confidence conditions with the outcomes that followed. When the agent's confidence was high and subsequent execution succeeded, the conditions that produced that high confidence are recorded as positive calibration examples: instances in which the evaluation function was correct to be confident. These examples reinforce the function's existing response to similar conditions.
When the agent's confidence was high but subsequent execution failed, the conditions are flagged as overconfidence indicators: conditions under which the confidence evaluation function produced an unjustifiably high confidence value. Overconfidence indicators identify exactly the situations a calibrated function should learn to treat more conservatively, because the agent's prior assessment of sufficiency was contradicted by the outcome.
When the agent's confidence was low and the resulting suspension prevented a failure that would otherwise have occurred, the conditions are flagged as successful safety interventions. Whether the suspension actually prevented a failure is determined by post-hoc analysis of the conditions that existed during suspension, not assumed from the suspension alone. A successful safety intervention is evidence that the function's conservatism in that region of the input space was warranted, and it is recorded as such so that calibration does not erode protective behavior that has demonstrably paid off.
Self-Supervised Refinement Without External Labeling
The accumulated calibration examples enable supervised refinement of the confidence evaluation function. The refinement targets the function's weighting parameters, its threshold sensitivity, and its input feature selection. Because the labels are the agent's own recorded outcomes, the refinement proceeds from the agent's behavioral history without requiring external labeling or human annotation. The success or failure of past execution, and the post-hoc assessment of past suspensions, are the supervision signal.
This is the distinguishing property of the mechanism: the agent is the source of both the predictions being evaluated and the ground truth used to evaluate them. The confidence trajectory recorded in lineage during ordinary operation is precisely the dataset needed to test whether the evaluation function's confidence values were justified, so calibration consumes operational history rather than a separately curated training set.
Calibration as a Governance-Bounded Policy Mutation
The calibration process is itself governance-bounded. Changes to the confidence evaluation function's parameters are treated as policy mutations and are subject to the same cryptographic signing and lineage recording requirements that apply to all policy changes. A refinement to the weighting parameters, the threshold sensitivity, or the input feature selection is therefore not a silent internal adjustment: it is a recorded, signed mutation that downstream governance infrastructure can inspect.
This treatment closes the loop with the confidence field's existing audit properties. The confidence values were auditable because every mutation to the confidence field was recorded in lineage; the function that produces those values is now auditable on the same terms, because every change to that function is recorded as a signed policy mutation. An auditor can reconstruct both the confidence the agent claimed and the successive refinements to the rule that produced those claims.
Composition With the Confidence Governor
Calibration sits behind, not in front of, the confidence governor. The governor gates execution by comparing the computed confidence value, and its trajectory, against the authorization threshold; it operates on whatever value the evaluation function emits. Calibration changes how that value is emitted by refining the evaluation function, so its effect on gating is indirect. A function refined away from overconfidence will, under conditions previously flagged as overconfidence indicators, emit lower confidence values, and the unchanged governor will then suspend execution earlier under those conditions. The gating logic does not need to change for the agent's behavior to improve; the input to the gate is what improves.
Because the refined parameters include threshold sensitivity and input feature selection, calibration can also adjust which agent state inputs and task state inputs most strongly drive the confidence computation. Conditions that history shows to be reliable predictors of failure can be given more weight; inputs that did not discriminate between success and failure can be deprioritized in feature selection. The trajectory record is what makes this discrimination possible.
What the Mechanism Does Not Assert
The disclosure frames calibration as a refinement of the evaluation function from recorded outcomes, governed as a policy mutation. It does not specify a numeric calibration factor applied multiplicatively to live confidence, a bounded floor or ceiling that confidence values cannot pierce, an asymmetric tightening-versus-relaxation rate, a partition of the confidence range into bands, or a fixed-length rolling window with declared rate parameters. The mechanism's content is the use of the confidence trajectory, the three classes of calibration example, the self-supervised refinement of weighting parameters and threshold sensitivity and input feature selection, and the governance binding of that refinement as a signed, lineage-recorded policy mutation. The protective behavior of the agent comes from the confidence governor's gate and from the function being calibrated, not from a separate scalar machinery layered on top of it.
Disclosure Scope
The use of the agent's confidence trajectory, comprising the temporal sequence of confidence values, the structured observations that triggered confidence changes, and the outcomes of execution and suspension decisions, as a structured calibration signal for refining the confidence evaluation function; the derivation of positive calibration examples from high confidence followed by successful execution, overconfidence indicators from high confidence followed by failed execution, and successful safety interventions from low confidence whose suspension prevented a failure as determined by post-hoc analysis; the supervised refinement of the function's weighting parameters, threshold sensitivity, and input feature selection from the agent's own behavioral history without external labeling or human annotation; and the treatment of changes to the function's parameters as policy mutations subject to the same cryptographic signing and lineage recording requirements as all policy changes, are disclosed in the cognition filing (U.S. Application No. 19/647,395 and its international counterpart). This article describes that disclosed mechanism. The scope extends to embodiments differing in the supervised refinement procedure applied to the calibration examples, provided the calibration signal is the recorded confidence trajectory, the examples are labeled by the agent's own recorded outcomes, and the resulting parameter changes are recorded and signed as governed policy mutations.