Auditable Harm-Minimization for Autonomous Vehicles: Declared Harm Ordering, Not Emergent Heuristics

Nick Clark

Domain Context

The regulatory landscape for harm minimization in autonomous driving has crystallized over the past several years into a small set of formal frameworks, each of which presupposes that an autonomous vehicle's harm-tradeoff behavior is inspectable. The U.S. National Highway Traffic Safety Administration's Automated Vehicle Safety, Transparency, and Evaluation Program (AV STEP), launched as a voluntary framework and now functioning as a de facto reference for federal oversight, asks operators to characterize how their systems behave in conflict scenarios and how that behavior is governed. A class of formal responsibility models, exemplified by published responsibility-sensitive-safety (RSS) formulations, proposes mathematical definitions of minimum safe distances, response times, and reasonable-foresight envelopes that produce a verifiable definition of "fault" in a collision. UN ECE Regulation 157, the Automated Lane Keeping System (ALKS) regulation, prescribes specific harm-minimization behaviors in named scenarios and has been adopted across the EU, Japan, and Korea.

Beneath these frameworks sit harder ethical instruments. Germany's 2017 Ethics Commission on Automated and Connected Driving issued twenty rules, several of which directly address harm ordering: a prohibition on personal-feature-based discrimination among potential victims, a requirement that unavoidable harm be minimized in aggregate, and a general principle that harm-ordering rules be transparent rather than emergent. The European Commission's 2020 Ethics of Connected and Automated Vehicles report and the 2024 IEEE 7000-series standards on ethically aligned design extend the same posture into international guidance.

These frameworks share a structural assumption that current AV stacks largely fail to satisfy: that the harm ordering against which the vehicle acts is a declared, externalizable, and replaceable artifact, not a property emergent from millions of lines of planner and prediction code.

Architectural Requirement

A harm-minimization-honest autonomous driving stack must expose three artifacts to its regulator, its operator, and its post-incident investigator. First, the active harm ordering, the priority lattice over outcome classes (occupant safety, vulnerable-road-user safety, property damage, traffic-flow disruption, comfort) that the vehicle is currently using, must be a named, signed object retrievable for any moment in operational history. Second, every decision in which two or more orderable harms were in tension must produce a structural event recording which orderings were evaluated, which alternative trajectories were considered, and which deviation from the declared ordering, if any, was incurred. Third, deviations must themselves be credentialed: a vehicle that departs from its declared ordering (because, say, sensor uncertainty triggered a fallback) must record the credential under which the deviation was authorized.

Without these three artifacts, RSS conformance becomes a statistical claim, ALKS conformance becomes a scenario-coverage exercise, and Ethics Commission compliance becomes rhetorical. With them, each becomes a verifiable architectural property.

Why Procedural Compliance Fails

The dominant industry posture toward harm minimization is procedural and post-hoc. Operators publish safety case documents that describe, in natural language, how their systems prefer to behave; they accumulate scenario libraries demonstrating in-distribution behavior; they reconstruct, after incidents, what the stack did and why it appeared reasonable. None of these mechanisms exposes the harm ordering as an inspectable object, and none of them survives the conditions under which harm-minimization regulation actually bites: a fatal collision in which two harm classes were in tension, a regulator who asks which ordering was active, and a courtroom in which the answer must be defensible against expert scrutiny.

The procedural posture also cannot survive jurisdictional pluralism. A vehicle operating in Munich is subject to a harm ordering shaped by the German Ethics Commission's twenty rules; the same physical vehicle, driving across the border into France or Switzerland, is subject to a different ethical and legal substrate; the same model, deployed in California, Texas, or Singapore, encounters yet other priority lattices. An implementation-embedded ordering forces the operator to either ship a single global compromise (and accept that the compromise is illegal somewhere) or maintain divergent codebases per jurisdiction (and accept the verification cost). Neither is sustainable as deployment scales.

The deepest failure is in incident review. When a regulator asks why the vehicle preferred occupant deceleration over evasive steering in a near-miss with a vulnerable road user, a procedural stack answers with a planner-trace reconstruction: feature weights, predicted-trajectory rankings, cost terms. The regulator cannot tell whether the answer represents a declared ordering that was honored or an emergent ordering that was rationalized after the fact. The asymmetry of evidence is itself the regulatory exposure.

What Governed Actuation Provides

The Governed Actuation layer treats every physical actuation (every command to a brake-by-wire, steer-by-wire, or throttle-by-wire actuator) as a governed, revocable, auditable act rather than a direct command. Each proposed actuation is evaluated through a composite admissibility evaluator before execution, and the evaluator emits a graded outcome (admit, gate, defer, solicit, reject, or escalate) rather than a binary go/no-go. The harm-minimization deviation primitive disclosed in the provisional sits directly on top of this: when a governed unit confronts a configuration in which no available actuation path avoids harm, it selects the path that minimizes composite projected harm across all entities in the spatial region.

The harm ordering is a first-class, policy-defined object. In the disclosed primitive it is an entity-class harm ordering: a governance-policy-defined ranking of harm coefficients across entity classes (for example, the highest coefficient assigned to human life and bodily integrity, lower coefficients to animal life, to physical property, to the governed unit itself, and the lowest to designated replaceable fixtures such as guardrails, shoulder berms, and crumple zones). The ordering is configurable by the deploying authority per jurisdictional regulatory framework, per deployment-domain ethical framework, and per organizational policy. The specific ordering is immaterial to the primitive; what the primitive supplies is the composite harm-admissibility evaluation across that declared ordering, combined with the admissibility evaluator, and a deviation-lineage record of the decision.

In a tension scenario, a candidate-path generator produces the actuation paths available given the unit's current kinematic state, its capability envelope, and its observed environment. A harm projector projects composite expected harm across entities over each candidate path, consuming the entity-class harm coefficients and an empathy-weighting input from the integrity engine. The evaluator combines per-path projected harm with per-path admissibility into a composite harm-admissibility score, and the path selector chooses the most favorable score. A deviation-lineage recorder then records the candidate-path set, the harm projections per path, the governance-policy parameters applied, the selected path, and the execution, so the decision is deterministically reconstructible after the fact rather than reconstructed by narrative. Post-actuation verification compares observed effects against expected effects and feeds closed-loop refinement.

Two further mechanisms in the layer matter for harm cases. Reversibility-aware commitment-point evaluation classifies each candidate path on a reversibility ontology (reversible, partially reversible, irreversible, time-bounded, condition-bounded, probabilistic) and elevates admissibility thresholds for paths with irreversible sub-steps, preferring reversible paths where both are admissible and detecting the commitment point beyond which continuation becomes irreversible. And the harm-minimization deviation integrates with the graduated-actuation mode selector: a high-confidence deviation proceeds in full mode; a lower-confidence deviation proceeds in constrained mode; an uncertain scenario proceeds in consultative mode when a human operator or higher-authority agent is reachable within the decision horizon; and a path exceeding governance-policy risk thresholds resolves to disabled mode with the decision lineage-recorded. This gives the stack a vocabulary (disabled, simulated, advisory, consultative, shadowed, partial, constrained, stage-gated, deferred, full, emergency-accelerated) for cases that binary execute-or-suppress architectures cannot articulate cleanly. Notably, the primitive admits self-damaging actuation paths (steering into a guardrail or crumple zone to avoid a pedestrian) as candidates rather than excluding them categorically, which prior pre-collision and emergency-steering systems do not.

Compliance Mapping

The primitive maps directly onto the major regulatory and standards frameworks. Against NHTSA AV STEP, the declared ordering and the per-decision deviation events supply the transparency the program requests, in a form that survives incident-driven scrutiny rather than accreting to it. Against RSS, the harm ordering encodes the safety-distance and response-time properties RSS requires, while the deviation record makes RSS conformance a verifiable per-event property rather than a statistical aggregate. Against UN R157 ALKS, each named scenario maps to a specific decision class within the ordering, and per-decision records demonstrate scenario conformance directly.

Against the German Ethics Commission's twenty rules, particularly the prohibition on personal-feature discrimination and the aggregate-harm-minimization rule, the policy-signed ordering becomes the artifact under which conformance is asserted. A regulator can read the ordering and verify that no protected attribute appears as a priority criterion; an operator can demonstrate, across millions of decisions, that the active ordering was the one signed by the appropriate authority. Against ISO 21448 (Safety of the Intended Functionality, SOTIF) and ISO 26262 (functional safety), the harm-ordering object provides the explicit hazard-prioritization layer those standards require but do not architecturally specify.

Jurisdictional pluralism resolves naturally: the same vehicle, crossing a regulatory boundary, loads the ordering signed by the receiving jurisdiction's authority, and the transition is itself a credentialed event. There is no global compromise and no codebase fork.

Adoption Pathway

Adoption begins where the regulatory pressure is sharpest: in the operator's primary deployment jurisdiction, on the harm-classification axis the local regulator is most likely to interrogate. For a U.S. robotaxi operator, this typically means vulnerable-road-user prioritization under the NHTSA Standing General Order regime; for an EU operator, ALKS-named scenarios under R157; for a Chinese operator, the harm-minimization clauses of the GB/T autonomous-vehicle standard series. Initial deployment surfaces the active ordering as a signed artifact, instruments tension scenarios with structural decision records, and replaces planner-trace reconstruction with credentialed evidence.

The second adoption step extends the ordering to cover the operator's full deployment surface and adds the deviation-credential infrastructure that lets the architecture record fallback behavior under sensor degradation, edge-case detection, or operator-commanded mode change. The third step generalizes to cross-jurisdictional operation: the ordering is signed by the receiving jurisdiction's authority on entry, and the architecture's transition record satisfies the cross-border auditability that current stacks cannot meet.

The cumulative effect is that the operator's harm-minimization posture becomes a regulatory asset rather than a regulatory liability. Incident review becomes a verification exercise against declared orderings; jurisdictional expansion becomes an ordering-signature engagement rather than a software fork; ethical scrutiny becomes a question about which ordering is authorized rather than about which behavior emerged. The hard ethical questions remain. They belong to the policy-signers, not to the autonomy stack, and the architecture stops obscuring where they live.

Disclosure Scope

This article describes an application of the Governed Actuation layer of the governed spatial mesh, including its confidence-governed execution primitive, composite admissibility evaluator (admit, gate, defer, solicit, reject, escalate), graduated-actuation mode selector, reversibility-aware commitment-point evaluation, harm-minimization deviation mechanism over a governance-policy-defined entity-class harm ordering, and lineage-recorded actuation provenance, as disclosed in U.S. Provisional Application No. 64/049,409. The named regulatory and standards regimes (NHTSA AV STEP, UN ECE R157 ALKS, ISO 21448 SOTIF, ISO 26262, and the cited national ethics instruments) are referenced as domain and compliance framing only; they are not claims of the application, and their accurate description here does not assert conformance certification. The entity-class harm ordering, jurisdictional signing authorities, and confidence-threshold sets described above are governance-policy-defined parameters, configurable by the deploying authority; the inventive subject matter is the architectural primitive that evaluates and records harm-minimization deviations against such a declared ordering, not any particular ordering.