Adept AI Automates Actions Without Structural Integrity

Nick Clark

Adept AI Automates Actions Without Structural Integrity

by Nick Clark | Published March 28, 2026 | PDF

Adept AI builds AI agents that understand user intent and execute multi-step actions in software applications. The agents observe screens, plan action sequences, and execute clicks, keystrokes, and navigation steps to complete tasks. The action capability is genuine. But an agent that can take actions and an agent that maintains structural integrity across those actions are different systems. An action agent without coherence architecture can execute a sequence of individually correct steps that produce a collectively incoherent outcome. The gap is between action capability and structural integrity, and it is the gap the AQ human-relatable intelligence primitive is designed to close.

1. Vendor and Product Reality

Adept AI Labs, founded in 2022 by veterans of the original Transformer paper and the OpenAI codex team, was the first generally recognized "action model" company — the firm that defined the category of large models that take actions in software rather than emit text answers. Its trajectory through 2024 was iconic of the action-agent thesis: ACT-1 demonstrated browser-and-desktop automation from natural-language instructions, Fuyu and Persimmon advanced multimodal screen understanding, and the company built integrations targeting enterprise productivity and analyst workflows. In mid-2024 most of the founding research team and the underlying technology transferred to Amazon under a licensing arrangement that left Adept the operating company while folding the action-model research into AGI-Labs / AWS, and the residual Adept entity continues to deliver enterprise agent products to existing customers and partners.

Architecturally an Adept-style action agent is a multimodal model that perceives application interfaces (DOM tree, accessibility tree, raster screen), interprets a natural-language objective, decomposes it into a sequence of UI-grounded actions (click, type, scroll, navigate), executes each action through a controlled browser or desktop driver, observes the resulting state, and re-plans on divergence. The action surface is the same surface a human operator uses, which is the strategic point of the category: the agent works against the long tail of enterprise software that does not expose stable APIs, and onboarding does not require integration engineering. Error recovery is handled by re-perception and re-planning. Long-horizon tasks are handled by chaining sub-objectives and persisting intermediate state in a working memory.

The commercial achievement is real. Adept and the broader action-agent category — which now includes Anthropic's computer-use mode, OpenAI's Operator, the various open-source browser-use and OS-use stacks, and embedded agentic features in the major productivity suites — have demonstrated practical automation of meaningful enterprise workflows: form filling, report assembly, ticket triage, cross-application data movement, and analyst-style research. Within the scope of step-level execution the proposition holds. The architectural shape, however, is plan-and-execute with re-plan on divergence. Step correctness is the design center; cross-step, cross-session, cross-domain structural integrity is not.

2. The Architectural Gap

The structural property the Adept stack — and the action-agent category generally — does not exhibit is closed-loop integrity over the action stream. Each action is selected against the current screen and the remaining sub-objective, with no first-class representation of the agent's commitments across the workflow, of calibrated confidence about its own interpretation, or of empathy for the user whose intent is being executed at one remove. Failures collapse into "the step did not produce the expected next state, re-plan." Drift collapses into "the user can interrupt." Auditability collapses into "we logged the actions." None of those is a structural property; they are wraparound mitigations around an agent that is, structurally, a step executor.

The gap matters because enterprise action-taking is not a sequence of independent steps. It is a continuous commitment whose value depends on coherence: did the form values entered in step seven contradict those entered in step three, did the cross-application reconciliation honor the business rule that spans the two systems, did the agent's interpretation of "send the contract to procurement" preserve the user's underlying goal when procurement's intake form turned out to require a field the user did not anticipate, did the agent's confidence in an ambiguous screen warrant proceeding rather than pausing. None of these is observable from per-step success rate, and none is recoverable by training a more capable action model. They require a cognitive architecture above the action model, not a stronger action model.

Adept cannot patch this from within the action-model paradigm because the paradigm itself optimizes a per-step objective conditioned on a planner's sub-goal. Adding longer plan horizons, larger memory windows, or stronger re-planning produces longer step sequences, not a different architecture. The agent still has no structural mechanism for asking whether its interpretation of the user's intent has drifted, whether its confidence is calibrated to the ambiguity of the current screen, whether its identity across a session is the identity the user authorized, or whether the cumulative effect of its actions remains within the commitments under which the user delegated. The missing primitive is human-relatable intelligence: a closed cognitive architecture in which integrity, self-esteem, and empathy feedback loops constrain the action policy, with conformity attestation that makes agent behavior auditable as a structural property rather than as a log.

3. What the AQ Human-Relatable Intelligence Primitive Provides

The Adaptive Query human-relatable intelligence primitive specifies a closed cognitive architecture comprising three structural feedback loops, a coherence engine that arbitrates among them, narrative-identity continuity across sessions, graceful degradation under uncertainty, and conformity attestation that exposes the agent's reasoning in terms of its architectural constraints. The integrity loop monitors whether the agent's actions across the workflow remain consistent with the user's declared intent and the agent's prior commitments, flagging drift before it becomes irreversible. The self-esteem loop validates whether the agent's confidence in its current interpretation is calibrated to the ambiguity actually present in the screen and the task, distinguishing "high-confidence step against an unambiguous interface" from "high-confidence step against an interface the agent has not seen this variant of before."

The empathy loop, parameterized for a delegating principal rather than a conversational partner, monitors whether the agent's behavior remains aligned with the user's evolving intent as that intent becomes clearer through interaction or clearer through the discovery of constraints the user did not anticipate. It governs when the agent proceeds, when it pauses to verify, and how it surfaces ambiguity in a form the user can resolve. The coherence engine arbitrates among the three loops, producing a unified policy that does not optimize step completion at the cost of integrity, does not exceed calibrated confidence, and does not race past the user's effective consent. Graceful degradation under uncertainty contracts the agent's action scope when any loop reports reduced support — it pauses, partially executes, or refuses with a structured rationale rather than continuing at full confidence with declining support.

Narrative identity is the architectural commitment that the agent maintains a coherent account of who it is, what it is doing, and why, across pauses, interruptions, sessions, and handoffs. Conformity attestation exposes the integrity-self-esteem-empathy reasoning behind any action as a first-class output, so when the user asks why the agent did what it did, the answer traces structurally rather than narratively. The primitive is technology-neutral with respect to the underlying action model, planner, and UI driver; what it imposes is the closed cognitive shape. It composes hierarchically, so an individual agent, a per-user agent fleet, a tenant, and a multi-tenant deployment each instantiate the same loops at the appropriate scale. The inventive step disclosed in the AQ human-relatable intelligence application is the closed three-loop architecture with coherence-engine arbitration, narrative identity, and conformity attestation as a structural condition for AI agents that act on a user's behalf under sustained delegation.

4. Composition Pathway

Adept integrates with AQ as a domain-specialized perception-and-action stack running underneath the human-relatable intelligence cognitive architecture. What stays at Adept (and at the action-agent layer it represents): the multimodal perception, the UI grounding, the planner, the action driver, the application-specific adapters, the enterprise integrations, the deployment and management surface, and the commercial relationship with the customer. Adept's investment in action-model capability — the screen-understanding model, the action-grounding training data, the cross-application generalization — remains its differentiated capability and the source of its product value.

What moves to AQ as cognitive substrate: the integrity, self-esteem, and empathy loops, the coherence engine, narrative identity, and conformity attestation sit above the action agent and arbitrate its outputs against session-level commitments and the user's evolving intent. Integration points are well-defined. The Adept planner emits candidate actions with a calibrated-confidence and intent-alignment annotation rather than executing directly; the coherence engine evaluates them against the current commitment set and either admits, defers, partially executes, or refuses with a structured rationale exposed to the user. Cross-session continuity is provided by the narrative-identity layer rather than by raw conversation logs, so an interrupted task resumes against a coherent self-account rather than a re-interpretation. Conformity attestation produces an audit-grade reasoning record per action, available to the user, the tenant administrator, and downstream compliance.

The new commercial surface is governed agentic automation for enterprises that cannot accept the per-step success-rate framing and that need cross-step structural integrity, auditability, and graceful pause as architectural properties. The cognitive architecture belongs to the customer's tenant, not to the action-model vendor, so behavioral lineage is portable across vendor refreshes and across multi-vendor agent fleets — Adept's residual products, computer-use-class agents from frontier labs, embedded suite agents, and custom in-house agents all participate in the same substrate. That portability deepens Adept's relationship with the customer because the action model becomes the differentiated execution capability accessed through a stable, audit-grade cognitive substrate, rather than a closed system the customer must take or leave wholesale.

5. Commercial and Licensing Implication

The fitting arrangement is an embedded cognitive-substrate license: Adept embeds the AQ human-relatable intelligence primitive as the cognitive layer above its action agents and sub-licenses participation to its enterprise customers as part of the deployment contract. Pricing is per-tenant or per-delegated-principal rather than per-action, which aligns with how enterprises actually consume governed agentic automation — they buy a relationship of delegation, not a meter of clicks. Premium tiers cover regulated-industry deployments, multi-vendor agent fleets, and cross-tenant federations where the cognitive substrate spans Adept, computer-use agents from frontier labs, and embedded suite agents under a single coherence and attestation policy.

What Adept gains: a structural answer to the agentic-trust problem that has constrained enterprise rollout to low-stakes workflows under heavy human supervision, a defensible architectural position against the broader action-agent category — including the action-model successors inside Amazon's umbrella, computer-use modes from frontier labs, and embedded suite agents — by elevating the architectural floor from step execution to governed delegation, and forward compatibility with EU AI Act high-risk-system obligations, NIST AI RMF, SOC 2 expectations evolving toward agent activity, and emerging financial-services and healthcare regulator expectations on agentic automation. What the customer gains: portable behavioral lineage, coherent multi-vendor agent fleets, graceful pause that protects the workflow rather than corrupting it, narrative-identity continuity that survives interruption, and conformity attestation that makes agent behavior structurally auditable. Honest framing — the AQ primitive does not replace the action agent; it gives the action agent the cognitive architecture that enterprise-grade delegation has always needed and that no per-step optimizer can produce.