Redemption Engine

Nick Clark

Mechanism

The redemption engine is a subsystem of the integrity architecture that generates restorative semantic mutations following deviation events. It is not a probabilistic recovery heuristic and it is not a separate self-modifying pathway. It is activated by the coherence pressure generated during Phase 3 of the coherence trifecta, the self-esteem-driven coherence restoration phase, and it produces candidate mutations that, if executed, would partially or fully restore the agent's integrity in the domain or domains affected by the deviation.

To understand the activation, recall the coherence trifecta: empathy registers harm and generates deviation pressure, integrity records the deviation as truth in the lineage, and the self-esteem mechanism generates coherence pressure, a return force that drives the agent back toward accountable balance after a deviation event. Coherence pressure manifests computationally in three ways: a reduction in self-esteem that increases future deviation resistance, a negative-valence affective observation that modulates the agent toward increased caution, and an activation of the redemption engine that generates candidate restorative mutations. The redemption engine is therefore the constructive arm of coherence restoration: it converts the abstract return force into concrete, executable, integrity-restoring actions.

The Four-Stage Pipeline

The redemption engine operates through four stages in sequence: deviation analysis, restorative mutation generation, restoration impact projection, and restoration prioritization and scheduling. Each stage consumes the output of the prior stage, and the pipeline begins from the deviation log entry for the triggering deviation event rather than from raw output.

This staged structure is what distinguishes the engine from a generation-and-retry loop. The engine does not re-prompt a model to produce better output and accept whatever comes back. It first analyzes what was lost, then constructs candidates targeted at that specific loss, then projects how much each candidate would restore and at what cost, and only then schedules execution. Every stage is grounded in the same integrity evaluation machinery that recorded the original deviation, so restoration is measured against the same standard that detected the harm.

Deviation Analysis and the Restoration Target

The first stage examines the deviation log entry for the triggering deviation event and extracts the specific dimensions of integrity loss: which domain was affected, what the gap is between the deviating action and the applicable declared value, what harm was projected and observed, and what the self-esteem impact was. The three-domain integrity model, personal, interpersonal, and global, matters here, because restoration in one domain is not interchangeable with restoration in another. A deviation that breached a relational commitment requires interpersonal restoration, not merely a higher personal self-esteem score.

The output of deviation analysis is a restoration target: a structured specification of what would constitute adequate restoration for the specific deviation. The restoration target is the engine's objective. Everything downstream is generated and scored against it.

Restorative Mutation Generation

Based on the restoration target, the engine generates a set of candidate restorative mutations. Each candidate is a semantically coherent action that, if executed, would contribute to closing the gap between the agent's current integrity state and the integrity state that would have existed absent the deviation. The disclosure describes candidate restorative mutations that include corrective actions that directly address the harm caused by the deviation, for example providing correct information after a deviation that produced incorrect output; compensatory actions that provide value to the affected entities as recompense for the harm caused; process improvements that reduce the likelihood of similar deviations in the future, for example raising the effective ethical threshold for the deviation category; and disclosure actions that transparently communicate the deviation to affected entities, supporting interpersonal and global integrity restoration.

These candidate classes are illustrative of the mutation types the engine produces, not a fixed menu. What unifies them is that each is a concrete action aimed at the restoration target, and each is itself a semantic mutation subject to the agent's ordinary governance.

Restoration Impact Projection and Scheduling

The third stage computes the projected integrity restoration impact of each candidate using the same integrity evaluation mechanisms that assessed the original deviation. Each candidate receives a restoration score indicating how much integrity it would restore across each domain, and a cost estimate indicating the resources, time, and operational disruption required to execute it.

The fourth stage ranks the candidates by their ratio of restoration impact to execution cost and schedules them for execution. The scheduling respects the agent's current operational priorities and resource constraints. Restorative mutations are not emergency overrides unless policy specifies otherwise; they are integrated into the agent's normal operational queue with priority weighting that reflects the urgency of the integrity restoration need. The engine thus restores integrity without seizing control of the agent: realignment proceeds through the same queue and the same priorities that govern ordinary work.

Governance and Partial Restoration

Execution of restorative mutations follows the same governance and lineage recording requirements as all other mutations. Restorative mutations are not exempt from policy validation, trust slope continuity requirements, or integrity impact assessment. Each restorative mutation is itself evaluated by the integrity engine before execution, ensuring that the restorative action does not produce secondary integrity violations. This closes a potential loophole: an engine that could commit through a privileged path might amplify the very harm it was meant to remediate. Here, restoration is held to the standard it is trying to restore.

The engine does not guarantee restoration. Some deviations produce irreversible consequences that cannot be fully restored through subsequent action. In such cases the engine generates the best available partial restoration and records the restoration gap, the residual integrity loss that could not be addressed by restorative mutations, in the deviation log. The restoration gap is not discarded: it informs moral trajectory forecasting and contributes to the long-term assessment of the agent's integrity trajectory.

Relationship to the Integrity Trajectory

The redemption engine is the mechanism behind the redemption arc, the trajectory archetype in which the agent's integrity is improving: deviation frequency is decreasing, self-esteem is recovering, the coherence trifecta is functioning normally, and active restorative mutations are producing positive integrity restoration. Moral trajectory forecasting evaluates the effectiveness of active restorative mutations when projecting the agent's integrity evolution, so the engine's output feeds directly into whether the forecast classifies the agent as on a redemption arc, a stabilization arc, a radicalization arc, or a containment arc.

This connects the engine to the broader distinction the disclosure draws between integrity and coherence. Integrity is the record of deviation. Coherence is the ability to account for deviation, remain auditable, and restore balance. The redemption engine operationalizes the restore-balance half of coherence: it is how an agent that has deviated, and has honestly recorded that it deviated, takes concrete, governed, auditable action to close the gap rather than merely registering it.

Disclosure Scope

The redemption engine, comprising activation by the coherence pressure of Phase 3 of the coherence trifecta and the four-stage pipeline of deviation analysis producing a restoration target, restorative mutation generation across corrective, compensatory, process-improvement, and disclosure mutation classes, restoration impact projection scoring each candidate by domain restoration and cost, and restoration prioritization and scheduling into the agent's operational queue, together with the requirement that restorative mutations pass the same governance, lineage, and integrity evaluation as all other mutations and the recording of the restoration gap when full restoration is impossible, is disclosed in the cognition filing (U.S. Application No. 19/647,395 and its international counterpart) at Section 3.12. This article describes that disclosed mechanism. The scope extends to embodiments whose candidate mutation classes differ in naming or composition, and to embodiments in which restoration impact is projected at differing fidelity, provided restoration remains generated against an analyzed restoration target, scored by integrity restoration impact, and committed through the same governance path as ordinary mutations.