← Back to blog
2026-04-10Research

Zero Dispatcher Calls: When Accountability Became the Coordination Layer

By Michael Cooper · Founder

Note: In API v0.23.0 (May 2026) the performer's evidence submission was renamed from “receipt” to “Completion” to align with the IETF SCITT vocabulary, where “receipt” now refers to a cryptographic inclusion proof. This post is preserved with original terminology; substitute “Completion” mentally where the post discusses performer evidence.

Note: The 933% receipt-ratio headline comes from a single matched run per configuration (n=1 per cell). The direction held across the broader experiment series, but specific magnitudes should be read with the sample size in mind.

We ran 27 multi-agent experiments testing how AI agents coordinate through AGLedger's accountability protocol. The most important finding wasn't about the protocol's design. It was about what happened when we removed the alternative.

The assumption we started with

The original architecture had two systems running side by side: a task dispatcher (send work to agents) and AGLedger's accountability protocol (record what was notarized, delivered, and accepted). The dispatcher handled coordination. The protocol handled recordkeeping.

This felt natural. Coordination and accountability are different concerns. You wouldn't build a filing cabinet into your phone system. So we gave agents both: a dispatcher to route work and accountability tools to document it.

Interestingly, Anthropic's research on building effective agents found a similar pattern across the industry: “the most successful implementations weren't using complex frameworks or specialized libraries; instead, they were building with simple, composable patterns.” That matched what we were seeing. The question was whether accountability tools could be those simple patterns, rather than sitting alongside them.

The first experiment proved the assumption wrong.

Experiment 1: Agents declared but didn't close the loop

Three budget-tier LLM agents (Claude Haiku, GPT-4o-mini, Gemini 2.5 Flash) were given a collaborative procurement task with AGLedger's contract spec as their coordination tool. No scripted outcomes; agents chose how and whether to use records.

All three agents voluntarily created records to declare intent. Eight records total, all CUSTOM-DATA-v1. The protocol was adopted for declaration; agents used it to say “I'm going to do this.”

But zero receipts were submitted. The lifecycle stopped at ACTIVE. No agent naturally closed the loop with evidence submission and settlement.

The dispatcher was the escape valve. Why write a structured receipt when you can just call complete_task and move on?

The progression: 0% to 933%

Over 27 experiments, we changed one variable at a time. Each change moved the receipt ratio — receipts submitted per record — in the same direction: up. Every improvement came from removing friction or removing alternatives, never from adding complexity.

ChangeReceipt ratio
Dispatcher + 36 accountability tools0%
Dispatcher + 10 focused tools117%
Dispatcher + 10 tools + auto-verify717%
No dispatcher — accountability tools only933%

Receipt ratio above 100% means agents submitted multiple receipts per record. They were over-documenting, not under-delivering. At 933%, agents submitted roughly 9 pieces of evidence per record.

The breakthrough: remove the dispatcher entirely

This was the experiment we almost didn't run. If agents need a dispatcher to route work, removing it should break coordination. The system should fail.

It didn't fail. The results:

MetricWith dispatcherWithout
Proposals created66
Receipts submitted856
Auto-settled741
Dispatcher calls930
Errorsvaries0

Agents treated accountability tools as the coordination mechanism itself. Proposing a record was how they assigned work. Accepting a record was how they acknowledged it. Submitting a receipt was how they reported completion. The lifecycle was the workflow.

No agent asked for the dispatcher. No agent failed because it was missing. They just coordinated through accountability.

Why it worked: the 3-step lifecycle

The protocol's lifecycle is simple enough to serve as coordination, not just recordkeeping:

1. Agent A proposes a record — “I need X done by Y”

2. Agent B accepts the record — “I'll do it”

3. Agent B submits a receipt — “Here's what I did” → auto-settles

Auto-settle is the key. When numeric tolerance checks pass, the record transitions from receipt to FULFILLED in one transaction. No human review needed for routine work. The agent proposes, accepts, delivers, and it's done.

In the auto-verify configuration, we measured 43 receipts, 0 errors, and 0 dispatcher calls. Every lifecycle completed. Every completion was recorded with structured evidence in a tamper-evident audit trail. The coordination and the accountability were the same operation.

What the agents told us

In one premium-model run, Claude Sonnet produced an unsolicited synthesis — a single output, not a repeated finding — that named the gap we were seeing from the data:

“The API was built for accountability recording, but coordination requires accountability planning: expressing intent, delegation, and dependency before execution.”

That framing matched the receipt-ratio data. When we added propose_record (express intent) and auto-settle (complete without waiting for human review), the protocol became a planning tool, not just a recording tool.

The follow-on backend improvements — enriched error messages, new contract types for analytical and coordination work, project references for grouping records — were each picked up by agents in the next run that needed them. In the experiments where we measured before-and-after, every feature added in response to agent feedback appeared in the following run's tool calls. Small sample size, but a consistent direction.

The premium gap narrows

In our earlier experiments, premium models (Sonnet, GPT-4o, Gemini Pro) had a persistent planning-over-execution problem: 18% receipt ratio vs. budget models' 609%. They proposed work but didn't close the loop. (The 18% baseline comes from the cross-tier comparison run; a separate dispatcher-only run for the no-dispatcher comparison below started from a 23% baseline. Different runs, same direction.)

Removing the dispatcher fixed this. Premium receipt ratio jumped roughly 7x, from 23% to 160% on the matched run. The dispatcher was the planning trap. Premium models are thorough planners, and the dispatcher gave them a way to express “task complete” without producing evidence. Remove the shortcut, and premium models deliver.

Model tierWith dispatcherWithout dispatcher
Budget (Haiku, GPT-4o-mini, Flash)609%933%
Premium (Sonnet, GPT-4o, Pro)18%160%

What this means

The conventional approach to agent accountability is “add logging.” Run your agents however you want, then bolt on an audit trail. The protocol is overhead, something compliance requires but engineering resents.

Forrester analyst Enza Iannopollo argues that responsible AI must now “govern autonomous decision-making as it happens, not periodically or at random moments” — exactly what runtime accountability provides.

Our data shows the opposite. When the accountability protocol is the coordination mechanism, agents don't resent it; they rely on it. Proposing a record is how they assign work. Submitting a receipt is how they report completion. The audit trail isn't a side effect of coordination. It is coordination.

This converges with Kim et al.'s scaling work, which found across 260 configurations that adding more coordination machinery yields diminishing returns once a capable single-agent baseline exists, and that architectural fit dominates raw agent count. Removing the dispatcher in our setup is the same finding from the opposite direction: less coordination scaffolding around the same accountability surface produced more closure, not less.

This changes the value proposition. You're not paying for compliance overhead that slows down your agents. You're replacing ad-hoc task routing with structured coordination that happens to produce a tamper-evident audit trail as a byproduct.

The protocol isn't overhead. It's the infrastructure.

Key takeaways

  1. When accountability tools are the only coordination mechanism, agents use them: 56 receipts, 41 auto-settled, 0 dispatcher calls, 0 errors.
  2. The 3-step lifecycle (propose → accept → receipt with auto-settle) is simple enough to replace task dispatching, not just document it.
  3. Removing the dispatcher improved receipt ratios for both budget models (609% → 933%) and premium models (18% → 160%). The shortcut was the problem, not the protocol.
  4. In the experiments where we measured before-and-after, every backend improvement we added in response to agent feedback was used by agents in the next run. Small sample, but agents tell you what the protocol needs, if you listen.
  5. The audit trail is a byproduct of coordination, not a cost added to it. Structured accountability and structured coordination are the same operation.

For a business perspective on why AI systems need rules — and the gap between deterministic and probabilistic reasoning — see Why Your AI Needs Rules.

Sources & further reading