Figure's Humanoid Learns Tasks Without Knowing Its Envelope

Nick Clark

Figure's Humanoid Learns Tasks Without Knowing Its Envelope

by Nick Clark | Published March 28, 2026 | PDF

Figure AI is building a general-purpose humanoid robot that acquires manipulation and locomotion skills through imitation learning, reinforcement learning, and foundation model integration. The approach targets a humanoid that can learn new tasks from demonstration and language instruction rather than requiring explicit programming for each behavior. The ambition is substantial and the engineering is advancing rapidly. But learned skills do not inherently carry capability self-awareness. A policy that learned to make coffee in training does not know whether it can make coffee right now, with the current gripper condition, battery level, and environmental layout. Capability awareness provides this: persistent envelopes that track what learned skills can actually accomplish in current conditions. This article positions Figure AI against the AQ capability-awareness primitive disclosed under provisional 64/049,409.

1. Vendor and Product Reality

Figure AI, founded in 2022 by Brett Adcock and headquartered in the San Francisco Bay Area, has emerged as one of the most capitalized and visible entrants in the general-purpose humanoid robotics race. The company has progressed publicly from Figure 01 through Figure 02 and beyond, with reported deployments at automotive manufacturing customers (BMW Spartanburg) and ongoing pilots in logistics and warehousing. Figure has assembled a substantial AI team to develop Helix and successor end-to-end vision-language-action models that drive whole-body humanoid behavior, and has raised at valuations in the multi-billion-dollar range alongside peers Tesla Optimus, 1X Neo, Apptronik Apollo, and Agility Digit.

The architectural shape is well-understood: a full-size bipedal humanoid platform with multi-fingered or parallel-jaw end effectors, a sensor stack centered on cameras and proprioception, on-board compute sized for vision-language-action inference, and a learning pipeline combining teleoperated demonstrations, simulation reinforcement learning, and foundation-model-mediated language instruction. New skills are learned, distilled into the policy, tested in fleet operations, and deployed back to the units. The deployment model is a fleet of humanoids whose collective experience feeds central training, with policy updates flowing back to the field. Helix-class models are the company's bet that a sufficiently general vision-language-action model trained at scale can drive a humanoid through whole-body behavior in unstructured environments.

Figure's strengths are real: a credible humanoid platform iterated on a fast cadence, public manufacturing deployments that validate the form factor outside laboratory conditions, an AI organization sized to compete on the foundation-model axis, and a commercial-go-to-market that takes the deployment problem seriously. Within its scope of learned-policy humanoid execution, the company is among the technical leaders. The product is the leading edge of what the analyst community calls "general-purpose humanoid" — robots whose value proposition is repurposability across tasks rather than fixed automation for one workflow.

2. The Architectural Gap

The structural property Figure's architecture does not exhibit is persistent, queryable capability self-awareness over its learned skill repertoire. A humanoid that has learned fifty manipulation skills possesses fifty policies; the architecture does not represent which of those policies are reliable in the current physical and environmental state. Each learned skill has an implicit capability envelope defined by the training distribution, by the embodiment state at training time, and by the conditioning context, but that envelope is not exposed as explicit, persistent state that the robot or its operator can query before committing to execution.

The gap matters because general-purpose deployment is precisely the regime where training-distribution coverage cannot be assumed. A humanoid moved from one warehouse aisle to another encounters lighting, surface texture, object property, and spatial layout variations the policy may or may not have seen. A gripper that has logged ten thousand pick cycles has different physical characteristics than the gripper at training time. Battery degradation across a shift affects motor precision in ways that affect manipulation envelope but not locomotion envelope, and the architecture has no representation that distinguishes them. The robot can attempt and observe success or failure; it cannot predict capability before execution, cannot decline a task on principled grounds, and cannot signal to the operator which skills have narrowed envelopes today.

As the skill library grows under foundation-model-driven learning, the gap compounds rather than diminishes. More skills means more potential capability and, simultaneously, more ways to fail unpredictably. Vision-language-action models exacerbate this — a Helix-class model can be prompted to attempt almost anything, and the model's confidence in the output trajectory is not a substitute for grounded capability state. Figure cannot patch this from within the learned-policy architecture, because capability awareness is a structural property over the skill set, not a feature of any individual policy. Adding model uncertainty estimation does not produce capability tracking; adding self-supervised success classifiers does not produce envelope state; adding fleet-aggregated failure statistics does not produce per-unit, per-condition capability awareness. The envelope is an architectural shape, and Figure's shape is fundamentally that of a learned-policy executor without a capability-state substrate.

3. What the AQ Capability-Awareness Primitive Provides

The Adaptive Query capability-awareness primitive specifies that every learned or programmed skill be paired with a persistent capability envelope — a multi-dimensional state object that tracks the skill's reliability conditional on the robot's current physical state, environmental conditions, and recent execution history. The envelope is not a confidence score; it is a structured state with at least three load-bearing dimensions: physical-state conditioning (gripper wear, joint backlash, battery, thermal), environmental conditioning (lighting, surface, object class, spatial layout), and temporal trajectory (recent success rate, drift, recovery posture).

Temporal forecasting is intrinsic. The envelope projects how skill reliability will evolve over the next minutes, the next shift, and the next maintenance interval. Battery depletion narrows manipulation envelopes before it narrows locomotion envelopes; thermal accumulation narrows precision-dependent skills before bulk-handling skills; gripper wear narrows fine manipulation before it narrows coarse pick. The joint condition of capability, time, and uncertainty enables the robot — or the operator's planner — to accept, decline, defer, or substitute tasks based on whether the current and projected skill envelopes can reliably accomplish them.

The primitive is technology-neutral: any policy class (learned, model-predictive, classical), any sensor stack, any embodiment. It composes hierarchically — per-skill, per-unit, per-fleet, per-deployment — so an operator can query the envelope of a single robot, a station, or the whole fleet under one model. The recursive closure is load-bearing: every execution outcome updates the envelope, and the updated envelope re-enters as a credentialed observation that downstream planning admits and weights. The inventive step disclosed under USPTO provisional 64/049,409 is the persistent capability-envelope substrate as a structural condition for self-aware learned-skill deployment, distinct from any specific learning algorithm, policy class, or robot platform.

4. Composition Pathway

Figure integrates with AQ as a humanoid embodiment running over the capability-awareness substrate. What stays at Figure: the humanoid platform, the sensor stack, the on-board compute, Helix-class vision-language-action models, the demonstration pipeline, the fleet operations infrastructure, and the entire customer relationship. Figure's investment in humanoid hardware iteration and foundation-model-mediated whole-body control remains its differentiated layer. The composition is additive: capability awareness wraps the existing skill execution pipeline rather than replacing any part of it.

What composes through AQ as substrate: every learned skill is paired with a capability envelope that updates from execution telemetry and conditions on physical and environmental state. The integration points are well-defined. Skill registration emits an envelope schema seeded from training-distribution metadata. Pre-execution, the planner queries the envelope under current conditions and either admits, declines, or downgrades the task — a humanoid asked to perform a fine manipulation skill at low battery and high thermal load returns "envelope narrowed, defer or substitute" rather than attempting and failing. Post-execution telemetry — success, partial, drift, recovery — updates the envelope as a credentialed observation. Foundation-model-mediated task acceptance is gated by envelope query: Helix can propose any plan, but execution is admissible only when the envelope confirms the constituent skills are within reliable bounds for current conditions.

The new commercial surface is self-aware humanoid deployment for industrial customers — automotive manufacturing, logistics, warehouse operations, regulated facilities — that need predictable, declinable, auditable robot behavior rather than best-effort learned-policy execution. The capability envelope belongs to the customer's operating context and is portable across Figure software updates and platform revisions, which paradoxically makes Figure stickier, because the platform's hardware quality and learning velocity are precisely what differentiate access into that substrate. For the operator, the value is the ability to run shift planning and task assignment against the actual capability state of the fleet rather than against an abstract skill catalog.

5. Commercial and Licensing Implication

The fitting arrangement is an embedded substrate license: Figure embeds the AQ capability-awareness primitive into the humanoid runtime and sub-licenses envelope participation to its industrial customers as part of the platform deployment. Pricing aligns with how customers actually consume self-aware robotics — per-unit-envelope, per-fleet, or per-credentialed-skill — rather than per-robot-hour, which preserves Figure's existing deployment economics while introducing a new high-value self-aware tier above them.

What Figure gains: a structural answer to the "trust the learned policy" problem that customer safety reviews increasingly raise as humanoid deployments scale beyond pilot sites; a defensible position against Tesla Optimus, 1X, Apptronik, and Agility by elevating the architectural floor from learned-policy executor to self-aware substrate; and a forward-compatible posture against ISO 10218, IEC 61508, FDA-style learned-system guidance, EU AI Act high-risk classification, and OSHA general-duty enforcement that are converging on capability-state and predictability requirements for autonomous machines in human environments. What the customer gains: predictable, declinable, auditable humanoid behavior; envelope-grounded shift and task planning that survives Figure software updates; and a structural record that supports incident review, insurance underwriting, and regulatory inspection. Honest framing — the AQ primitive does not replace learning; it gives learned-skill humanoids the self-awareness substrate they have always needed and never had.