← Back to blog
2026-04-19Research

Why receipts must be signed: a threat model for agent accountability

By Michael Cooper · Founder

Every team evaluating agent accountability asks the same first question: “can we just log this to our existing pipeline?” It is a fair question. Logs are cheap, familiar, and already wired into every SIEM and data warehouse. The answer is that logging is sufficient until you need to answer a question that logs are not structured to answer — and those questions arrive the moment money, legal commitments, or cross-organizational boundaries enter the picture.

The default assumption

Most production agent systems today handle accountability the same way they handle everything else: a structured log line goes to Datadog, CloudWatch, Splunk, or an S3 bucket. A row gets written to a Postgres table. A payload lands in a Kafka topic. If something goes wrong, an engineer greps for the right timestamp.

For most of what agents do, this is the right amount of engineering. Reading a file, summarizing a document, drafting a reply — there is nothing that needs to survive scrutiny six months later. The log exists so that the next on-call can see what happened.

The problem is not that logging fails for these cases. It is that teams apply the same pattern to cases where the stakes have changed, and do not notice the mismatch until someone asks a question the logs were never designed to answer.

Four scenarios where logs quietly fail

The gap between “we have logs” and “we can prove what happened” shows up in four recurring scenarios. Each one is survivable in isolation. Together they describe the coverage gap that structured accountability exists to close.

1. Retroactive dispute

An agent processes a refund, a journal entry, an outbound payment, a compliance filing. Two weeks later, the counterparty disputes it. Your log row says the action happened. Their log row says it did not — or says something different. Neither log can prove the other wrong. Neither is cryptographically bound to the commitment that was supposed to authorize it in the first place.

The dispute resolves by negotiation, not by evidence. This is workable between small teams inside one company. It is not workable when a regulator, an insurer, or a customer with a contract is the counterparty.

2. Insider tampering

A privileged engineer, a compromised service account, or a badly-scoped automation edits a row in your audit table. Maybe they redact a field under pressure from legal. Maybe they overwrite a timestamp to match a different story. Maybe they delete a line that turned out to be embarrassing.

A log store does not prevent this. A log store with write-once semantics slows it down, but the moment the storage layer is trusted, so is anyone with administrative access to the storage layer. Tamper-detection without cryptographic binding is a policy, not a guarantee.

3. Cross-organizational verification

An agent in your system completes work that affects another organization. Their compliance team wants to verify what actually happened — without trusting your database, your logging pipeline, or your operator discipline. You cannot give them SQL access to your audit table. You should not ask them to trust a PDF export.

Logs are addressable only within the trust boundary that produced them. The moment an accountability question crosses organizations, the record needs to be verifiable without trusting the storage layer that holds it. That is a cryptography problem, not a log-retention problem.

4. Audit reproducibility

A regulator, an ISO 42001 auditor, or a large customer procurement team asks you to prove that a specific sequence of events happened in a specific order — offline, from an export, without live access to your systems. They want to reconstruct the chain from the evidence itself, not from your assurance that the evidence is correct.

An exported log file is a point-in-time copy of rows. There is nothing intrinsic to the export that lets a third party verify the rows have not been selectively filtered or re-ordered. Audit reproducibility requires each record to carry its own proof that it belongs where it says it belongs in the sequence.

What signing buys you, mechanically

Signed, hash-chained records address all four scenarios by changing who has to be trusted. The defense is built out of three primitives, stacked:

PrimitiveWhat it proves
Ed25519 signatureThis exact content was endorsed by the holder of this private key. Flips a bit and the signature fails.
SHA-256 hash chainEach record carries the hash of the record before it. Removing, inserting, or reordering any record invalidates every record that follows.
Mandate → receipt → verdict bindingEvery receipt cites a specific mandate. Every verdict cites a specific receipt. The three-part chain cannot be reassembled from fragments because the citations are inside the signed content.

Individually, each primitive is unremarkable. Stacked, they move trust from the storage layer to the keys. The database can be compromised and the record still refuses to verify against anyone else's copy of the public key. The audit export can be opened offline, verified against the key, and the sequence reconstructed from the hash chain alone.

That is what “proof you can verify offline” means in practice. Not “we think this is what happened.” A mathematical check that succeeds or fails, on someone else's laptop, without a round trip to your systems.

What signing does not buy you

Signed records are stronger than unsigned records. They are not a substitute for any of the following, and teams that conflate the two end up disappointed:

  • Content judgment. A signature says “this content was endorsed” — not “this content is correct.” If the agent submitted a wrong number, the signed receipt proves the wrong number was submitted on purpose, by the party who held the key. Someone still has to decide whether the number was acceptable. That is the verdict, and the verdict is a human or system judgment, not a cryptographic primitive.
  • Access control. Signing does not decide who is allowed to create a mandate or submit a receipt in the first place. That is a policy question — a Layer 1 concern in the three-layer stack — and it belongs in whatever policy control fronts the system.
  • Content inspection. AGLedger does not read the body of the work the agent did. It records structure: the commitment, the delivery, the decision, and the timeline. If the work itself needs to be evaluated for quality, safety, or correctness, that happens in whichever system knows how to evaluate it. The accountability record points at the evidence; it does not replace the evidence.
  • Non-repudiation of the underlying act. A signed receipt proves a claim was made. It does not prove the claim was true. If an agent signs a receipt saying “transferred $500 to account X,” the signature binds the claim to the signer. Whether $500 actually moved is a question for the system of record that moves money. The Dual Trail pattern exists precisely to reconcile the claim against the independent evidence.

The practical threshold

Not every agent action needs a signed receipt. The practical threshold is the crossing of one of three lines:

  1. Money or value movement. Anything that causes a dollar to move, an invoice to issue, or an asset to change hands. The Settlement Signal lives here; this is the baseline case for signed records.
  2. Legal or compliance commitment. Anything that will show up in a regulatory filing, an auditor's sample, or a contract deliverable. If the question “can you prove this?” is reasonable, the answer needs to be better than “here is the log line.”
  3. Cross-organizational action. Anything that affects a party outside the trust boundary of your own systems — a supplier, a customer, a counterparty, a partner. The counterparty cannot verify your logs. They can verify your signature.

Inside those three lines, logging is still the right primitive for most things. A signed receipt is overkill for telling on-call which agent hit which tool at 3am. The rule is not “sign everything” — it is “sign the things that have to survive disagreement.”

How AGLedger implements this

AGLedger is the accountability layer that sits underneath an agent's work. Every mandate is Ed25519-signed at creation. Every receipt is signed by the performer and hash-chained to the mandate it responds to. Every verdict is signed by the principal and hash-chained to the receipt it decides on. The chain lives in a self-hosted vault on your infrastructure. Your keys, your data, your federation topology.

The four endpoints — create, receipt, verdict, fulfill — are the full AOAP lifecycle. There is no phone-home, no kill switch, and no SaaS dependency. The signed records verify offline against a public key anyone on either side of the accountability relationship can hold.

That is the shape of Layer 3 accountability infrastructure. Not a log pipeline. Not a policy enforcement point. A stateful, signed chain of custody that closes the coverage gap logging was never structured to close.

Key takeaways

  1. Logging answers “what happened?” Signed receipts answer “can you prove it to someone who does not trust you?” Those are different questions.
  2. Retroactive dispute, insider tampering, cross-organizational verification, and audit reproducibility are the four scenarios where unsigned logs quietly fail. Each one is common in enterprise AI agent deployments.
  3. Ed25519 signatures plus a SHA-256 hash chain plus mandate-receipt-verdict binding move trust from the storage layer to the keys. The audit export verifies offline.
  4. Signing is not a substitute for judgment, access control, content inspection, or the underlying system of record. It is the record that those other systems point at.
  5. The practical threshold is money, legal commitment, or cross-organizational action. Inside those lines, sign the record. Outside them, a log line is still fine.

See also