AGI-ERA · SELF-MODIFYING AI

PLENA AI Self-Modification & Hyperagent Attestation

Tamper-evident receipts of what a self-modifying agent did, under which version of its own code, with what authorization — built before the deployment, not after. Forward-positioned attestation infrastructure for hyperagents and other self-modifying AI systems, in advance of the deployment maturity that will produce the first consequential-harm cases.

The five workflows Research partnership inquiry Languages

Opening problem

In March 2026, Meta and collaborators published research introducing hyperagents — self-referential AI agents that integrate two things into a single editable program: a task agent that solves the target problem, and a meta agent that modifies both itself and the task agent. The architectural shift is that the mechanism responsible for generating improvements is itself subject to improvement. Self-modification extends not only to the agent's task-solving code but to the procedure by which the agent modifies itself.

As of the time of this writing, hyperagents are at the research-paper stage. The framework requires sandboxed execution environments, and the authors have warned that the system may behave destructively in unsandboxed contexts. Production deployment timelines are uncertain. The path from research paper to consequential real-world deployment for similar AI architectures has historically run two to four years.

The documentation gap that will form on the same timeline as the deployment maturity is substantial. When self-modifying agents begin making consequential decisions — financial transactions, public-service administration, contract execution, medical recommendations, governance actions — the question of accountability will turn on documentation that does not yet exist. Under which version of its own code did the agent take this action? What was the agent's full state at the moment of the decision? Who authorized the agent to act, and within what scope? When did the agent last modify itself, and what triggered the modification? Who is responsible — the human principal, the deploying organization, the model developer — when a self-modifying agent makes a consequential error?

The attestation gap will not be filled by the AI labs themselves. The labs' incentives favor opacity over verifiability of self-modification events; the legal regimes that will eventually require accountability are still forming; the standardized documentation infrastructure that allows multi-stakeholder verification across labs, regulators, and harmed parties does not exist.

PLENA AI Self-Modification & Hyperagent Attestation establishes the receipt infrastructure for this category, in advance of the deployment maturity. The architecture extends PLENA's existing receipt-grammar foundation to the specific evidentiary needs of self-modifying systems: state-attestation at decision moments, modification-event receipts that anchor the agent's code version at the time of action, authorization-chain documentation, and multi-forum handover for safety, regulatory, and liability inquiries.

This is forward-positioned infrastructure. The page is published now because the right time to establish category prior-art is before the category forms, not after the first consequential-harm case reaches court. The page acknowledges its forward-positioned nature openly. It does not claim that hyperagents are deployed at scale; it claims that PLENA's architecture is the right foundation for their attestation when deployment maturity arrives.

Five workflows PlenaProof covers

Each workflow produces four artifacts: a sealed declaration, the underlying evidence, refresh discipline across the agent's operational arc, and a multilingual handover packet calibrated to safety / regulatory / liability inquiries.

State-Attestation at Consequential Decision Moments

Documentation of the agent's full state at the moment of a significant decision: model version, training corpus identifier, configuration parameters, prompt context, available tools, environment variables, safety-monitor status. The complete state-vector hashed and anchored. Critical for later inquiry into why a particular agent made a particular consequential choice.

State Declaration.
Sealed State Evidence. Hash commitment to the full state-vector at decision time.
Authorization-Scope Attestation showing the decision was within the agent's delegated authority.
Multilingual Handover Packet.

Modification-Event Receipts

Whenever the hyperagent modifies its own task-solving code or its own modification mechanism, structured documentation of the modification: the prior version's cryptographic hash, the new version's cryptographic hash, the trigger condition that initiated the modification, the modification's scope, any safety-evaluation conducted before or after the modification. Builds the unbroken chain that future investigators trace to determine which version of the agent took which action.

Modification Declaration.
Sealed Modification Evidence.
Pre-and-Post Hashes committing the modification chain.
Multilingual Handover Packet.

Authorization-Chain Documentation

For each consequential action, the chain of authorization that permitted the agent to take it: the human principal who authorized initial deployment, the organization operating the agent, the scope of the agent's delegated authority, expiry or scope-change conditions, the human or organizational party with override authority. Critical for liability questions where the legal regime requires identifying a responsible human or accountable organization behind any AI-caused consequence.

Authorization Declaration.
Sealed Authority Evidence.
Scope-Limits Attestation defining the boundary of delegated authority.
Multilingual Handover Packet.

Performance and Behavior Pattern Attestation

Periodic refresh documenting the agent's behavior patterns across time: detected drift from baseline, anomalous decisions flagged by safety monitors, distribution of decisions across categories, error rates, override events. The continuous record that distinguishes "the agent has always behaved this way" from "the agent's behavior changed over time, and this is when."

Behavior Declaration.
Sealed Behavior Evidence Archive.
Drift and Anomaly Yearbook.
Multilingual Handover Packet.

Multi-Forum Handover for Safety, Regulatory, and Liability Inquiries

Structured packets calibrated to receiving forums: internal AI-safety auditors, external red-team and audit firms, AI-governance regulators (EU AI Act competent authorities under Article 67, US NIST AI Risk Management Framework auditors, UK AI Safety Institute, equivalent emerging regimes globally), plaintiffs' counsel in AI-caused-harm civil cases, criminal prosecutors investigating AI-caused harm, journalists investigating AI-system behavior, the deploying organization's own safety teams. The same underlying receipts presented in the format each forum recognizes.

Multi-Forum Declaration.
Forum-Specific Compilations.
Cross-Forum Continuity across multiple inquiries pursuing the same matter.
Multilingual Handover Packet.

Institutional version

API access and enterprise integration for AI labs deploying self-modifying systems, AI infrastructure companies, AI-governance regulators, AI-safety audit firms, and AI-system-affected parties. Pricing by negotiation as the market matures.

Target institutional partners as the category matures: frontier AI labs (Anthropic, OpenAI, DeepMind, Meta AI, others) developing self-modifying systems internally; AI-safety audit firms (Apollo Research, METR, Anthropic's own safety practice, equivalents emerging); AI-governance regulators (EU AI Office, US NIST, UK AISI, equivalents); academic AI-safety research centers (CHAI Berkeley, FHI's successors, CSER Cambridge, equivalents globally); AI-liability law firms as the category emerges.

Forward-positioned framing preserved. The institutional market for hyperagent attestation will form on the deployment-maturity timeline. The institutional offering is documented here to establish the architecture; pricing and pilot terms will firm up as the category matures and the first consequential-harm cases produce demand for the infrastructure.

The 100-Year Operating Commitment

AI-caused harm cases may unfold across years and decades. The agent that made a consequential decision in 2027 may be subject to inquiry in a 2035 civil case, a 2040 retrospective regulatory review, a 2050 historical-accountability process.

This product is built on the multi-decade arc of AI accountability inquiry. Where actually implemented and populated, the intended architecture replicates each artifact across multiple independent archives and anchors it cryptographically to public records that do not depend on any single jurisdiction or AI lab, and verifiable offline by inquiry parties. Receipts survive the AI lab's reorganization, the deployment company's dissolution, the regulator's policy changes, and the multi-administration arc of governmental AI accountability.

Read the full 100-Year Commitment

Why this differs from AI-lab internal documentation

AI labs maintain internal documentation of model versions, training runs, deployment events, and safety evaluations. The labs' documentation is institution-controlled, subject to confidentiality protections, accessible during litigation only through formal discovery, and trustworthy to the extent that the lab's interests align with the inquiry — which they often do not.

PLENA AI Self-Modification & Hyperagent Attestation operates on the opposite principle: receipts held by the deploying organization, the affected party, and the regulatory body, independent of the AI lab's internal records. The deploying organization (the bank using a hyperagent for loan decisions, the hospital using one for triage, the government agency using one for benefit eligibility) retains its own receipts of what the agent did, under which version, with what authorization — independent of the lab that built the model.

Deploying-organization-controlled

Rather than lab-controlled.

Contemporaneous capture

At the decision moment rather than retrospective reconstruction.

Tamper-evident

Hashed so any later change is detectable, without relying on an internal database you must trust.

Multi-forum handover

From one source rather than re-collection per inquiry.

The holder keeps their own record

A structured, tamper-evident copy the holder retains through the AI lab's reorganization, the deployment company's dissolution, or the regulator's policy changes. (Integrity-only: PLENA does not guarantee a record will withstand a hostile actor.)

Verifiable offline

By inquiry parties without ongoing cooperation from the lab or platform.

Existing instruments this complements

EU AI Act (Regulation 2024/1689), particularly Articles 26 (deployer obligations), 50 (transparency), and 67 (post-market monitoring)
US NIST AI Risk Management Framework and the AI Risk Management Framework Generative AI Profile
UK AI Safety Institute evaluation frameworks
ISO/IEC 23894:2023 AI Risk Management
Equivalent emerging regimes globally
The broader AI-safety and AI-governance literature
The PLENA white paper Beyond the Will

What this does not do

PLENA AI Self-Modification & Hyperagent Attestation does not certify AI safety. It does not evaluate hyperagent behavior. It does not constitute an AI audit. It does not provide AI-governance legal advice. It does not adjudicate AI-caused harm cases. It does not regulate AI labs or their deployments. It does not predict AI-system behavior. It produces tamper-evident receipts of what a self-modifying agent's state was at decision moments, what modifications it made and when, under whose authorization, with what observed behavior pattern — receipts that survive institutional and corporate change and remain available to inquiry parties across the multi-decade arc of AI accountability.

Languages

Launches in PLENA's 9 live languages. The page acknowledges that AI-governance vocabulary varies across jurisdictions and provides translation notes for the major regulatory frameworks (EU AI Act, NIST AIRMF, UK AISI methodology) in each language. Contact hello@joinplena.com for translator inquiries.

Scholarship and norms

This product is built in conversation with:

The Meta hyperagents paper (Zhang et al., March 2026) and the Darwin Gödel Machine literature (Zhang et al., 2025)
The broader AI-safety alignment literature (Bostrom, Christiano, Hadfield-Menell, Carlsmith)
AI-governance scholarship (Floridi, Cath, Coeckelbergh)
EU AI Act commentary and US NIST AIRMF practitioner guidance
The PLENA white paper Beyond the Will

Related PLENA receipt grammar

Agent-to-Agent Accountability AI Training & Deployment Corporate Anti-Corruption Compliance Procurement & Aid-Chain Attestation Whistleblower Documentation Journalism Source Protection Deepfake Victim Documentation Student Authorship Attestation Translation Roadmap

For frontier AI labs, AI-safety audit firms, AI-governance regulators, and academic AI-safety research centers

Anthropic / OpenAI / DeepMind / Meta AI internal safety practices; Apollo Research, METR; EU AI Office, US NIST, UK AISI; CHAI Berkeley, FHI successors, CSER Cambridge; AI-liability law firms as the category emerges: PlenaProof welcomes research-partnership conversations.

Research partnership inquiry hello@joinplena.com