2026-04-19Research

Why receipts must be signed: a threat model for agent accountability

By Michael Cooper · Founder

Note: In API v0.23.0 (May 2026) the performer's evidence submission was renamed from “receipt” to “Completion” to align with the IETF SCITT vocabulary, where “receipt” now refers to a cryptographic inclusion proof. This post is preserved with original terminology; substitute “Completion” mentally where the post discusses performer evidence. The underlying argument — that performer evidence must be signed — holds either way.

Every team evaluating agent accountability asks the same first question: “can we just log this to our existing pipeline?” It is a fair question. Logs are cheap, familiar, and already wired into every SIEM and data warehouse. The answer is that logging is sufficient until you need to answer a question that logs are not structured to answer. Those questions arrive the moment money, legal commitments, or cross-organizational boundaries enter the picture.

The default assumption

Most production agent systems today handle accountability the same way they handle everything else: a structured log line goes to Datadog, CloudWatch, Splunk, or an S3 bucket. A row gets written to a Postgres table. A payload lands in a Kafka topic. If something goes wrong, an engineer greps for the right timestamp.

For most of what agents do, this is the right amount of engineering. Reading a file, summarizing a document, drafting a reply. There is nothing that needs to survive scrutiny six months later. The log exists so that the next on-call can see what happened.

The problem is not that logging fails for these cases. It is that teams apply the same pattern to cases where the stakes have changed, and do not notice the mismatch until someone asks a question the logs were never designed to answer.

Four scenarios where logs quietly fail

The gap between “we have logs” and “we can prove what happened” shows up in four recurring scenarios. Each one is survivable in isolation. Together they describe the coverage gap that structured accountability exists to close.

1. Retroactive dispute

An agent processes a refund, a journal entry, an outbound payment, a compliance filing. Two weeks later, the counterparty disputes it. Your log row says the action happened. Their log row says it did not, or says something different. Neither log can prove the other wrong. Neither is cryptographically bound to the commitment that was supposed to authorize it in the first place.

The dispute resolves by negotiation, not by evidence. This is workable between small teams inside one company. It is not workable when a regulator, an insurer, or a customer with a contract is the counterparty.

2. Insider tampering

A privileged engineer, a compromised service account, or a badly-scoped automation edits a row in your audit table. Maybe they redact a field under pressure from legal. Maybe they overwrite a timestamp to match a different story. Maybe they delete a line that turned out to be embarrassing.

A log store does not prevent this. A log store with write-once semantics slows it down, but the moment the storage layer is trusted, so is anyone with administrative access to the storage layer. Tamper-detection without cryptographic binding is a policy, not a guarantee.

3. Cross-organizational verification

An agent in your system completes work that affects another organization. Their compliance team wants to verify what actually happened, without trusting your database, your logging pipeline, or your operator discipline. You cannot give them SQL access to your audit table. You should not ask them to trust a PDF export.

Logs are addressable only within the trust boundary that produced them. The moment an accountability question crosses organizations, the record needs to be verifiable without trusting the storage layer that holds it. That is a cryptography problem, not a log-retention problem.

4. Audit reproducibility

A regulator, an ISO 42001 auditor, or a large customer procurement team asks you to prove that a specific sequence of events happened in a specific order — offline, from an export, without live access to your systems. They want to reconstruct the chain from the evidence itself, not from your assurance that the evidence is correct.

An exported log file is a point-in-time copy of rows. There is nothing intrinsic to the export that lets a third party verify the rows have not been selectively filtered or re-ordered. Audit reproducibility requires each record to carry its own proof that it belongs where it says it belongs in the sequence.

What signing buys you, mechanically

Signed, hash-chained records address all four scenarios by changing who has to be trusted. The defense is built out of three primitives, stacked. None of these primitives is novel; hash-chained tamper-evident logging was formalized by Crosby and Wallach (USENIX Security 2009), and the same shape underlies RFC 6962 Certificate Transparency. What follows is an application of those well-understood primitives to agent accountability.

Primitive	What it proves
Ed25519 signature	This exact content was endorsed by the holder of this private key. Flips a bit and the signature fails.
SHA-256 hash chain	Each record carries the hash of the record before it. Removing, inserting, or reordering any record invalidates every record that follows.
Record → receipt → verdict binding	Every receipt cites a specific record. Every verdict cites a specific receipt. The three-part chain cannot be reassembled from fragments because the citations are inside the signed content.

Individually, each primitive is unremarkable. Stacked, they move trust from the storage layer to the keys. The database can be compromised and the record still refuses to verify against anyone else's copy of the public key. The audit export can be opened offline, verified against the key, and the sequence reconstructed from the hash chain alone.

That is what “proof you can verify offline” means in practice. Not “we think this is what happened.” A mathematical check that succeeds or fails, on someone else's laptop, without a round trip to your systems.

What signing does not buy you

Signed records are stronger than unsigned records. They are not a substitute for any of the following, and teams that conflate the two end up disappointed:

Content judgment. A signature says “this content was endorsed,” not “this content is correct.” If the agent submitted a wrong number, the signed receipt proves the wrong number was submitted on purpose, by the party who held the key. Someone still has to decide whether the number was acceptable. That is the verdict, and the verdict is a human or system judgment, not a cryptographic primitive.
Access control. Signing does not decide who is allowed to create a record or submit a receipt in the first place. That is a policy question, and it belongs in whatever policy control fronts the system — see where AGLedger sits relative to policy controls and agent guardrails.
Content inspection. AGLedger does not read the body of the work the agent did. It records structure: the commitment, the delivery, the decision, and the timeline. If the work itself needs to be evaluated for quality, safety, or correctness, that happens in whichever system knows how to evaluate it. The accountability record points at the evidence; it does not replace the evidence.
Non-repudiation of the underlying act. A signed receipt proves a claim was made. It does not prove the claim was true. If an agent signs a receipt saying “transferred $500 to account X,” the signature binds the claim to the signer. Whether $500 actually moved is a question for the system of record that moves money. The Dual Trail pattern exists precisely to reconcile the claim against the independent evidence.

The practical threshold

Not every agent action needs a signed receipt. The practical threshold is the crossing of one of three lines:

Money or value movement. Anything that causes a dollar to move, an invoice to issue, or an asset to change hands. The Settlement Signal lives here; this is the baseline case for signed records.
Legal or compliance commitment. Anything that will show up in a regulatory filing, an auditor's sample, or a contract deliverable. If the question “can you prove this?” is reasonable, the answer needs to be better than “here is the log line.”
Cross-organizational action. Anything that affects a party outside the trust boundary of your own systems — a supplier, a customer, a counterparty, a partner. The counterparty cannot verify your logs. They can verify your signature.

Inside those three lines, logging is still the right primitive for most things. A signed receipt is overkill for telling on-call which agent hit which tool at 3am. The rule is not “sign everything.” It is “sign the things that have to survive disagreement.”

How AGLedger implements this

AGLedger is the accountability layer that sits underneath an agent's work. Every record is Ed25519-signed at creation. Every receipt is signed by the performer and hash-chained to the record it responds to. Every verdict is signed by the principal and hash-chained to the receipt it decides on. The chain lives in a self-hosted vault on your infrastructure. Your keys, your data, your federation topology.

The four endpoints (create, completion, verdict, fulfill) are the full gated lifecycle. There is no phone-home, no kill switch, and no SaaS dependency. The signed records verify offline against a public key anyone on either side of the accountability relationship can hold.

That is the shape of Layer 3 accountability infrastructure. Not a log pipeline. Not a policy enforcement point. A stateful, signed chain of custody that closes the coverage gap logging was never structured to close.

Key takeaways

Logging answers “what happened?” Signed receipts answer “can you prove it to someone who does not trust you?” Those are different questions.
Retroactive dispute, insider tampering, cross-organizational verification, and audit reproducibility are the four scenarios where unsigned logs quietly fail. Each one is common in enterprise AI agent deployments.
Ed25519 signatures plus a SHA-256 hash chain plus record-receipt-verdict binding move trust from the storage layer to the keys. The audit export verifies offline.
Signing is not a substitute for judgment, access control, content inspection, or the underlying system of record. It is the record that those other systems point at.
The practical threshold is money, legal commitment, or cross-organizational action. Inside those lines, sign the record. Outside them, a log line is still fine.