Distributed Scientific Computing With Governed Agents

Nick Clark

Distributed Scientific Computing With Governed Agents

by Nick Clark | Published March 27, 2026 | PDF

Federally funded research now operates under a stack of overlapping obligations: the 2022 OSTP Nelson Memo and its NSF and NIH implementations require immediate public access to peer-reviewed publications and the underlying data; the FAIR principles supply the international vocabulary against which funders evaluate data management plans; ORCID and DOI infrastructure binds outputs to identities and to citable artifacts; ICH GCP, IPAC, and HIPAA constrain clinical computation; ISO/IEC 27001 governs information security for facilities that process regulated data; and ITAR and EAR restrict where export-controlled simulations may be executed and by whom. Scientific computing distributes computation across clusters, clouds, and grids, but it does not distribute the governance these regimes demand. A cognition-native execution platform closes that gap by representing each scientific workload as a governed autonomous agent that carries its provenance, enforces its reproducibility constraints, and respects its export-control envelope wherever it executes.

Regulatory Framework

The August 2022 OSTP memorandum, commonly known as the Nelson Memo, directed every federal agency funding more than 100 million dollars annually in research to update its public-access policy by the end of 2025. The NSF Public Access Plan and the NIH Data Management and Sharing Policy implement the directive. Both require that peer-reviewed publications resulting from federal funding be made freely available without embargo, that the data underlying the conclusions be deposited in an appropriate repository with a persistent identifier, and that the data be described by metadata sufficient for reuse. The NIH policy further requires a Data Management and Sharing Plan as part of every funding application and ties continued funding to demonstrated execution of the plan.

The FAIR Guiding Principles for Scientific Data Management and Stewardship, published in 2016 and adopted in substance by funders worldwide, articulate four properties: findable, accessible, interoperable, and reusable. ORCID identifiers attach persistent author identity to research outputs. DataCite and Crossref DOIs attach persistent identity to the outputs themselves. Together they constitute the citation graph against which funder reporting now operates. For clinical and translational work, ICH E6(R3) Good Clinical Practice imposes computational-system validation, audit-trail, and electronic-records obligations that extend to any analytical pipeline contributing to a regulatory submission. The Investigational Product Accountability requirements of IPAC and the security baseline of ISO/IEC 27001 govern the surrounding infrastructure. For physics, materials, and aerospace simulation, ITAR and EAR restrict execution of controlled technical data to authorized facilities and personnel, with criminal penalties for non-compliant export including movement of code or data to a foreign-operated cloud region.

Architectural Requirement

The combined effect of these regimes is that a computation is no longer a transient process: it is an artifact whose existence must be explicable, citable, reproducible, and lawful. The artifact must carry a complete record of its inputs, its code, its parameters, and its environment. It must be addressable by persistent identifier and bound to identified researchers. It must be reproducible by independent parties under the constraints of the originating governance. It must, where export-controlled, refuse execution outside its authorized envelope. And it must, where clinical, satisfy validation and audit-trail requirements that survive personnel turnover and infrastructure migration.

These properties cannot be retrofitted by metadata systems sitting beside the computation. They must be properties of the computation itself, evaluable wherever the computation executes, and verifiable by parties who do not trust the originating infrastructure.

Why Procedural Compliance Fails

The dominant pattern in scientific computing is to track provenance and policy externally. README files, lab notebooks, electronic lab notebooks, data management plans filed with funders, IRB protocols, and shared filesystem conventions describe what the computation should be doing. Job schedulers, workflow managers, and container registries describe what executed. The two records reconcile only through human attention, which is intermittent, uneven across collaborators, and effectively absent at the moment of multi-institutional handoff when the reconciliation matters most.

Workflow managers such as Nextflow, Snakemake, Galaxy, and Common Workflow Language solve part of this problem by formalizing the pipeline graph. They specify what steps execute in what order with what dependencies. They do not specify what governance the execution must satisfy: which library versions are admissible, which numerical-precision settings the conclusions depend on, which data inputs are licensed for the destination jurisdiction, which collaborators are authorized to view intermediate states. A pipeline executed on a different cluster with a different BLAS implementation, a different floating-point determinism setting, or a different Python minor version produces different results from the same definition. The pipeline's governance is not in the pipeline; it is in the cluster's site policy, which travels with the cluster, not with the work.

Container images partially address environment reproduction but not governance. A Docker or Singularity image fixes the execution environment at build time. It does not encode that the data within may not leave a U.S.-person-only facility, that the analysis must record an audit trail to GCP standards, or that the output must be deposited under a specific DOI prefix tied to the funding award. Containers also do not record what they did. They run, they exit, and the question of what computation contributed to which figure in which paper devolves once again to external bookkeeping.

The reproducibility crisis is a substantial and well-documented consequence of this gap. Published computational results frequently cannot be reproduced even by the original authors a year later, not because the science is wrong but because the governance of the computation was external to the computation and decayed faster than the conclusions it supported.

What the AQ Primitive Provides

A cognition-native execution platform represents each scientific computation as a governed agent. The agent is not a script that runs under a scheduler. It is a stateful entity carrying both its computation and its policy: the execution environment specification, the input-data version constraints, the numerical-precision requirements, the export-control classification, the IRB protocol identifier, the funding-award DOI prefix, the validation criteria the results must satisfy, and the human-oversight conditions under which deviation triggers stop-the-line behavior.

The agent's lineage is its provenance. Every state mutation, every input consumed, every intermediate result, every environmental probe is recorded in append-only memory with cryptographic chaining. Provenance is not a separate metadata document that may or may not have been updated. It is intrinsic, continuously emitted, and verifiable by hash chain against the published artifact identifier.

When the agent migrates between substrates, whether from a workstation to a campus cluster, to a national lab allocation, to a commercial cloud region, or across an institutional boundary in a multi-site collaboration, the agent presents its governance to the receiving substrate. The substrate either attests that it satisfies the governance, in which case execution proceeds and the attestation is recorded, or it refuses, in which case the agent does not execute. A cluster lacking the required library version, lacking the export-control facility clearance, or lacking the audit-trail capability rejects the agent rather than running it under degraded governance and emitting results whose provenance cannot honestly describe the conditions of their production.

Coping intercepts close the operational loop. When the agent detects governance violation in flight, whether because an input dependency changed mid-run, a node failure forced rescheduling onto an unverified substrate, or a downstream consumer requested an output that exceeds the agent's release authority, the agent halts, records the violation, and surfaces the decision to a human authority. The intercept itself is part of the lineage, providing the auditable evidence that the system responded to the deviation rather than continuing through it.

Compliance Mapping

The agent's lineage satisfies the FAIR principles structurally. Findability is achieved by binding the agent at instantiation to a DOI and ORCID set; accessibility by the lineage's deterministic export to standard PROV and RO-Crate representations; interoperability by the canonical schema of the lineage entries; reusability by the inclusion of the complete governance policy in the artifact, which is what reuse requires beyond data alone. NSF Public Access Plan and NIH DMSP execution evidence is the lineage itself: the funding agency reviewing compliance can verify that the data were deposited, the publication linked, and the analysis reproducible, without trusting the grantee's assertion.

ICH GCP audit-trail and electronic-records requirements map onto the append-only lineage with cryptographic chaining: the integrity property required by 21 CFR Part 11 is the same integrity property the platform provides by construction. ISO/IEC 27001 information-security controls find their evidentiary record in the same lineage, with substrate-attestation events providing the cross-domain trust anchors. ITAR and EAR enforcement is structural rather than administrative: an agent classified as controlled refuses substrate attestation from non-compliant facilities, eliminating the most frequent vector of inadvertent export, which is researcher convenience moving controlled work to a personal cloud account or a foreign-collaborator workstation.

For multi-institutional collaborations, agent-mediated computation eliminates the bilateral-trust problem. The receiving institution does not need to trust the sending institution's infrastructure. It needs only to verify the agent's lineage hash chain and attest its own substrate against the agent's governance. Cross-institutional reproducibility becomes a property of the artifact rather than a negotiation between IT departments.

Adoption Pathway

Research groups do not adopt governed agents by abandoning the workflow tools and HPC allocations they already depend on. The first phase wraps the existing pipeline as the computation of an agent and runs the agent under the existing scheduler, with the lineage layer collecting provenance passively. This phase, achievable in weeks rather than quarters, produces the first artifacts whose governance travels with them and provides the baseline reproducibility metrics that funders are increasingly requesting in DMSP execution reports.

The second phase enables substrate attestation. Campus and national-lab substrates publish their capabilities, the agent's governance is checked at submission, and mismatches surface before execution rather than as post-hoc reproducibility failures. For groups working with export-controlled or clinical data, this phase materially reduces the labor and risk of compliance attestation, because attestation is mechanized at the submission boundary.

The third phase opens cross-institutional execution. Agents move between collaborating institutions under their own governance, with the receiving substrate's attestation providing the evidence each institution's research-integrity office requires. Multi-site studies that previously required months of MOU negotiation execute under the agent-level trust framework with substantially reduced administrative overhead.

The fourth phase integrates with the citation graph. Agents bind to DOIs at instantiation, register with ORCID, and emit RO-Crate exports on completion that publishers and repositories ingest as first-class artifacts. The published paper, the deposited data, the executed analysis, and the funding award become a single navigable graph.

A fifth phase, available to facilities operating shared instruments and core services, exposes the substrate-attestation layer as a service to external collaborators. A telescope, a sequencer, a beamline, or a particle-physics detector can publish its computational and data-handling capabilities and accept agents from authorized collaborators whose governance the substrate satisfies. The instrument operator's compliance burden becomes attesting capabilities once rather than negotiating data-sharing agreements per collaboration. The collaborator's burden becomes presenting an agent rather than navigating the operator's bespoke submission infrastructure. Both sides retain auditable evidence of the governance under which the work proceeded.

The endpoint is a research operation in which reproducibility, public access, and export-control compliance are not deliverables produced for funder reports but properties the system exhibits continuously. For laboratories whose compliance burden is rising on every axis simultaneously, that consolidation is the difference between scientific computing that scales with the regulation and scientific computing that collapses beneath it.