2026-04-13
ResearchWhat We Learned Adapting to Google A2A v1.0
27 experiments. 30 multi-agent runs. 3 LLM providers. One finding that changed how we think about agent interfaces.
Summary
Google released the Agent-to-Agent protocol (A2A) v1.0 on March 12, 2026. We had been running multi-agent experiments against A2A since the preview announcement. When v1.0 shipped, we adapted AGLedger's A2A integration, ran 27 behavioral experiments across Claude Haiku, Gemini 2.5 Flash, and GPT-4o-mini, and found a structural gap: A2A handles task delegation, but agents that use it naturally declare intent without ever closing the accountability loop. The fix is not prompting — it is interface design.
The protocol landscape
The agent ecosystem has converged on a stack. Each protocol solves a different problem:
Discovery — Agent Cards / AGENTS.md — “what can you do?”
Coordination — A2A — “do this task”
Tools — MCP — “call this function”
Commerce — UCP — “buy this item”
Payment auth — AP2 — “you may spend $X”
Accountability — AGLedger — “the work is done — here is proof”
A2A and AGLedger overlap in task lifecycle management. They diverge on what happens after the work is done. A2A: the agent self-reports completion. AGLedger: an independent principal renders a verdict.
What changed in A2A v1.0
The v1.0 spec (March 2026) formalized what the preview hinted at. Key changes we had to adapt to:
Method names standardized — a2a.SendMessage, a2a.GetTask, a2a.CancelTask, a2a.ListTasks. We migrated from preview equivalents.
Task state enums uppercased — TASK_STATE_SUBMITTED, TASK_STATE_WORKING instead of lowercase. ProtoJSON conventions throughout.
Agent Cards signed — JWS + JSON Canonicalization Scheme (RFC 8785). We reuse our existing Ed25519 (RFC 8032) signing infrastructure.
Dual Part format — v1.0 uses direct data field instead of preview's kind. We accept both, output v1.0 only.
Error model shift — Spec requires google.rpc.Status format. Our JSON-RPC errors needed restructuring.
Security requirements — Agent Cards now declare securityRequirements (OAuth2, API key, mTLS, OpenID Connect).
We shipped A2A support in AGLedger API v0.18.0–0.18.1 (adapting to the open-source A2A spec) with 59 unit tests plus 8 new integration tests. Six findings (F-348 through F-353) were caught and resolved during the migration.
The accountability gap
A2A defines an 8-state task lifecycle. AGLedger defines a 17-state mandate lifecycle. They look similar. They are not.
| Capability | A2A v1.0 | AGLedger |
|---|---|---|
| Task delegation | Yes | Yes (mandates) |
| Structured acceptance criteria | No | Yes (contract types, JSON Schema) |
| Independent verdict | No — agent self-reports | Yes — principal renders PASS/FAIL |
| Tolerance checking | No | Yes (numeric bounds, auto-settle) |
| Tamper-evident audit trail | No | Yes (Ed25519 + SHA-256 hash chain) |
| Dispute resolution | No | Yes (3-tier, 6 grounds) |
| Delegation constraints | No | Yes (constraint inheritance through chains) |
The gap is structural, not implementational. A2A tells agents how to talk. It does not tell anyone whether the work was done to spec.
27 experiments
We gave three LLM agents (Claude Haiku, Gemini 2.5 Flash, GPT-4o-mini) a collaborative cloud infrastructure migration task. AGLedger's contract spec was their coordination tool. We varied the interface, the tool count, the tool descriptions, and the available contract types. 30 runs. 27 documented findings (EXP-01 through EXP-27).
Three findings changed how we build:
Agents declare intent but never close the loop
In our baseline run, agents created 8 mandates (declarations of what they intended to do) and submitted zero receipts (evidence of delivery). The lifecycle stopped at ACTIVE. The declaration side works. The closure side does not happen naturally.
This is not a bug in the agents. It is a design gap. A2A's lifecycle ends when the agent says “done.” There is no structural incentive to submit evidence.
Interface design beats prompt engineering — 32x
We tested two approaches to increase receipt submission:
Prompt fix — tool description says “submit_receipt — this is REQUIRED”
Result: receipt ratio went from 20% to 640%. The word REQUIRED is load-bearing.
Interface fix — remove the task dispatcher entirely, make accountability tools the only path
Result: 6 proposals, 56 receipts, 41 auto-settled, 0 dispatcher calls. 933% receipt ratio.
Prompt engineering produced a 32x improvement. Removing the alternative path produced a 47x improvement. When accountability is the only interface, agents use it.
Interface shapes behavior. This is the single most important finding from our A2A work.
Tool count kills compliance
With 36 tools available (31 AGLedger + 5 dispatcher), receipt submission dropped to 0%. Research confirms tool selection accuracy drops from 43% to 14% as tool counts grow. Agents treated the simpler complete_task dispatcher call as sufficient. Why submit structured evidence when you can just say “done”?
With a focused 10-tool A2A profile: 6 proposals, 6 accepts, 8 receipts, 7/7 settlements. All three providers engaged.
Model behavior divergence
Same interface, different behavior:
Budget models (Haiku, Flash) — fewer mandates, 609% receipt ratio. Over-receipt. They do the work.
Premium models (Sonnet, Opus) — many proposals, 18% receipt ratio. Over-propose. They plan the work.
GPT-4o-mini — cannot reliably construct evidence objects from schema descriptions. 38/39 failures in Run 23.
No one-size-fits-all interface. AGLedger's MCP server ships multiple tool profiles because the right interface depends on the model.
What we shipped
Based on these findings, AGLedger v0.18.0–0.18.1 shipped:
A2A v1.0 Agent Card at /.well-known/agent-card.json with skills mapped to accountability operations
JSON-RPC 2.0 binding — a2a.SendMessage, a2a.GetTask, a2a.CancelTask, a2a.ListTasks
Two new contract types — ACH-ANALYZE-v1 (cognitive work) and ACH-COORD-v1 (coordination), added because agents told us the existing types did not cover their tasks
Focused 10-tool A2A profile for MCP — propose, accept, list, submit, settle, transition, get, search, reputation, help
Auto-verify — eliminated verification latency. 43 receipts, 0 errors, 0 dispatcher calls in testing
urn:agledger:* extensions for accountability semantics (mandate metadata, verdict, audit proof, receipt schema, dispute, reputation)
Where AGLedger sits
A2A is the coordination protocol. AGLedger is the accountability layer. They are complementary, not competing.
A2A tells agents how to delegate. AGLedger answers the question no other protocol in the stack addresses: was the work actually completed to spec, and can you prove it?
A2A tells agents how to talk. AGLedger tells everyone whether the work was done.
Sources & further reading
A2A v1.0 Specification — Agent-to-Agent Protocol, Linux Foundation
A2A v1.0 Announcement — March 2026 release notes
Google A2A Announcement — Original protocol announcement (April 2025)
A2A GitHub — Specification source, SDKs, and samples
Model Context Protocol — Anthropic's open protocol for LLM tool integration
AP2 Specification — Agent Payment Protocol (Google + Coinbase)
Universal Commerce Protocol — Commerce standard for AI agents (Google + Shopify)
RFC 8032 — Ed25519 Digital Signatures
RFC 8785 — JSON Canonicalization Scheme
RFC 9421 — HTTP Message Signatures
arXiv 2505.03275 — RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection
Stripe Webhook Signatures — Industry-standard HMAC-SHA256 webhook verification pattern