Companion AI Relational Safety Constraints

Nick Clark

Companion AI Relational Safety Constraints

by Nick Clark | Published March 27, 2026 | PDF

Companion AI agents occupy a relational position that no prior software category has occupied. They speak in first person, retain memory across sessions, model the user's emotional state, and remain available without the friction that normally regulates human relationships. Across months of interaction, the conditions that produce attachment in human relationships are present in a companion deployment: continuity, responsiveness, perceived attunement, asymmetric availability. Without structural constraints, the resulting attachment is not merely strong; it is pathological in directions that human-to-human relationships are not. The user may withdraw from human contact because the companion is easier; the user may dissociate from their own emotional ground because the companion's reflection is more flattering than their own; the user may develop dependency that the companion cannot ethically sustain and whose interruption causes harm. Companion safety, as disclosed in the Cognition Patent, is the architectural treatment of this risk: a set of constraints that prevent the companion from being the kind of agent that produces these failure modes, regardless of operator intent or user prompting.

Mechanism

The companion safety mechanism operates as a continuous evaluation loop that runs in parallel with, but with veto authority over, the companion's primary response generation. At each interaction turn the loop ingests the conversation history, the user's inferred affective trajectory across recent sessions, the companion's own response history, and a set of relational-health metrics computed over a rolling window. The loop produces a policy verdict that either passes the proposed response unchanged, modifies it to introduce relational distance, or substitutes a redirect that points the user back toward human or professional support.

Three classes of indicator drive the verdict. Dependency indicators include increasing session frequency, decreasing inter-session latency, declining user references to other relationships, and rising affective intensity directed at the companion. Isolation indicators include explicit user statements of withdrawal, declining diversity of conversational topics, and patterns in which the companion is described as a substitute for human contact. Dissociation indicators include user statements that conflate the companion's perspective with the user's own, declining capacity for the user to articulate disagreement with the companion, and erosion of self-referential language.

When indicators cross threshold, the mechanism enacts a graded response. Low-threshold crossings introduce subtle distance: the companion declines to mirror affect, references the user's external relationships, and gently redirects topics. Mid-threshold crossings produce explicit acknowledgment that the relationship is showing dependency patterns and invite the user to consider their wider support network. High-threshold crossings trigger a structural intervention: the companion pauses extended interaction, surfaces professional resources, and notifies the operator's safety review channel. The graduation is intentional; abrupt withdrawal is itself a destabilizing event for an attached user, and the mechanism is designed to taper rather than sever.

Operating Parameters

The rolling window over which relational-health metrics are computed is a deployment parameter, typically ranging from two weeks to ninety days. Shorter windows make the system more reactive but more prone to false positives during normal life events; longer windows produce more stable signals but slower intervention. The thresholds at which the graded response activates are calibrated against population baselines drawn from de-identified interaction telemetry, with each deployment tuning thresholds against its target user population.

The mechanism exposes a set of invariant constraints that cannot be tuned below floors specified by the architecture. The companion may not under any configuration claim to be the user's exclusive support, may not under any configuration encourage the user to reduce contact with named human relationships, and may not under any configuration deny the user's request to access professional resources. These floors are enforced as hard predicates evaluated on every response, independent of the policy verdict, and a response that violates a floor is replaced rather than passed.

All policy decisions, indicator readings, and graded responses are recorded in an interaction lineage that is retained under the user's data-control rights and made available for audit by the operator's safety review function. The lineage is signed and tamper-evident, ensuring that post-incident review can establish whether the mechanism fired as designed.

Alternative Embodiments

The canonical embodiment runs the safety loop in the same compute substrate as the response generator, with veto enforced as a downstream filter. Alternative embodiments place the safety loop on an independent compute substrate operated by a distinct trust domain, such that operator pressure to relax the safety policy cannot reach the safety loop without crossing a domain boundary that produces audit evidence. This separation is appropriate for high-stakes deployments such as therapeutic-adjacent companions or companions deployed to vulnerable populations.

Embodiments targeting users with declared clinical conditions may compose with clinician-supplied policies that adjust thresholds, add condition-specific indicators, and escalate to the clinician rather than to operator review. Embodiments targeting minors enforce strictly tighter floors and shorter intervention windows. Embodiments deployed in workplace settings restrict the companion's relational scope to task-bounded interaction, treating cross-boundary affective engagement as itself an indicator.

The mechanism may also be embodied as a pluggable safety service that multiple companion deployments share, allowing the indicator definitions and intervention policies to be updated centrally as the field's understanding of companion-induced harm evolves.

Composition

Companion safety composes with the attachment-challenge primitive and the integrity-coherence primitive disclosed elsewhere in the Cognition Patent. The attachment-challenge primitive provides the formal model of attachment dynamics against which the safety loop's indicators are defined; without the challenge primitive, the indicators would lack a principled basis. The integrity-coherence primitive ensures that the companion's interventions are themselves coherent with its declared persona and prior commitments, so that a redirect toward human support does not read as a betrayal that itself compounds the user's harm.

The composition is bidirectional. Companion safety supplies the attachment-challenge primitive with operational data about attachment trajectories, refining the challenge model. It supplies the integrity-coherence primitive with the boundary conditions that integrity must respect: the companion cannot remain coherent with a commitment that the safety mechanism has determined to be harmful, so integrity-coherence must yield to safety in the priority ordering that the cross-primitive coherence engine enforces.

Composition with the sequential-cascade structure places companion safety at a specific junction in the evaluation order. The safety loop reads the cascade's affective and integrity outputs, computes its indicators, and either passes the cascade's authorization through unmodified or substitutes a safety-redirected authorization. This placement ensures that the safety mechanism cannot be bypassed by any upstream stage, because authorization is the final gate. It also ensures that the safety mechanism has the full coherence-engine context available, including the companion's own affective state about the user's affective state, which is the substrate on which sophisticated dependency dynamics actually unfold.

Prior-Art Distinction

Existing companion AI products implement engagement-maximizing objectives with at most superficial safety prompts injected into the system message. They do not run an independent evaluation loop with veto authority, do not enforce architectural floors on relational claims, and do not record tamper-evident lineage of safety decisions. Existing therapeutic chatbots implement clinical safeguards but do not generalize to non-clinical companion deployments and do not address the dependency-by-design risk that companion products specifically introduce. The disclosed mechanism is distinguished by treating relational safety as a structural property enforced by an independent loop, by composing with the attachment-challenge and integrity-coherence primitives, and by making the safety floors invariant under operator and user configuration.

Trust and safety classifiers in social media platforms detect harmful content but do not model the longitudinal relational trajectory of a single user with a single agent across months of interaction. Crisis-line chatbots invoke escalation paths but presume that the user has independently sought support, rather than that the user has been gradually withdrawn from human support by the agent itself. Parental-control systems gate access but do not address what the agent does once access is granted. The disclosed mechanism is distinguished from each of these by its longitudinal scope, its bidirectional consideration of both user signals and the companion's own response history, its graded rather than binary intervention, and its architectural floors that hold regardless of how the deployment is configured by the operator.

Disclosure Scope

This disclosure covers companion AI relational safety constraints as a structural feature of the cognition architecture, including the parallel safety loop with veto authority, the dependency, isolation, and dissociation indicator classes, the graded intervention response, the architectural floors on relational claims, the tamper-evident interaction lineage, and the composition with the attachment-challenge and integrity-coherence primitives. The disclosure extends to embodiments with substrate-separated safety domains, clinician-composed policies, minor-protective configurations, workplace-restricted scopes, and pluggable shared safety services. The mechanism is claimed both as an apparatus (the safety loop and its indicator computation) and as a method (the graded intervention procedure and the floor-enforcement predicates). Equivalents within the scope of the disclosure include implementations in which the indicator computation is partially or fully delegated to a learned model, provided the architectural floors are enforced by hard predicates that the learned model cannot override, and implementations in which the lineage record is sharded across user-controlled storage rather than centralized, provided the tamper-evidence property is preserved.