Zero-Scaffolding API Discovery: Can Agents Learn Your API from Scratch?
If you're building APIs that AI agents will consume, here is the question that matters: can an agent with no prior knowledge of your API discover it, learn it, and complete a full workflow — without SDKs, without documentation, without training?
We tested this. The answer is yes — but only if the API is designed for it.
Our testbed ran a series of zero-scaffolding experiments against the AGLedger API. An agent was given a single tool — http_request — and a starting URL. No SDK. No MCP tools. No curated tool descriptions. Just raw HTTP and whatever the API exposes about itself.
The experiment
We tested three integration modes across three LLM providers (Claude Haiku, Gemini Flash, GPT-4o-mini). Each mode represents a different level of scaffolding an API consumer might have:
- HTTP + /llms.txt — Agent gets only
http_requestand the base URL. It discovers/llms.txton its own, reads the API map, then navigates endpoints to complete a procurement mandate lifecycle. - MCP (11 tools)— Agent gets a curated set of MCP tools wrapping the API. No raw HTTP access.
- SDK (typed tools)— Agent gets strongly-typed SDK method wrappers with full type definitions.
The task was identical across all modes: create a procurement mandate, submit a receipt with evidence, and reach a terminal state (FULFILLED).
Results
| Mode | Mandate Created | Receipt Submitted | Lifecycle Complete | Avg HTTP Calls |
|---|---|---|---|---|
| HTTP + /llms.txt | 3/3 (100%) | 3/3 (100%) | 3/3 FULFILLED | ~6 |
| MCP (11 tools) | 3/3 (100%) | 3/3 (100%) | 0/3 (ACTIVE) | — |
| SDK (typed tools) | 0/3 (0%) | 0/3 (0%) | 0/3 | — |
The HTTP agent discovered the API, learned the lifecycle, and completed it in approximately 6 HTTP calls and 24 seconds. No prior knowledge. No documentation beyond what the API itself serves.
The SDK agent — with the most scaffolding of the three — achieved 0% mandate creation.
Why raw HTTP outperformed SDKs
This result is counterintuitive. SDKs are supposed to make integration easier. They do — for human developers. For agents, they introduced three compounding failures:
- Field name mismatch. The SDK tool wrapper exposed a
typeparameter, but the API expectscontractType. Agents used the tool parameter name, not the API field name. - Missing required fields.The SDK tool didn't surface
contractVersionandplatformas required parameters, so agents omitted them. - Schema strictness with no escape hatch. The agent read the schema and saw required fields, but also added extra fields the API rejected as
additionalProperties. The HTTP agent recovered by retrying with exact fields from the schema. The SDK agent kept including fields the tool parameter allowed but the API rejected.
The core insight: typed SDK tools that accept freeform criteria cannot enforce per-type schema constraints. The agent doesn't know what fields are forbidden, only what fields are required. Raw JSON gives agents more control to self-correct.
The /llms.txt pattern
The HTTP agent's first action in every run was to fetch /llms.txt. This file is a machine-readable map of the API: available endpoints, authentication requirements, lifecycle transitions, and schema discovery paths.
From that single file, the agent constructed a mental model of the full mandate lifecycle (DRAFT → REGISTERED → ACTIVE → receipt → FULFILLED) and executed it without a single failed create attempt in the best runs. When it did hit errors — a missing agentId or evidence field on receipt submission — the API's error messages were specific enough to self-correct in one retry.
Across 15 agent runs in our broader test suite, 15 out of 15 agents independently asked for upfront schema documentation — which already exists at GET /schemas/{type}. The agents just didn't know the endpoint was there until they hit an error. The fix was simple: include a schemaUrl field in validation error responses so agents follow the link instead of guessing.
The best API documentation for agents is the API itself. Error messages are your onboarding flow.
Cross-provider validation
The zero-scaffolding pattern held across providers when we expanded the test to include the full agent DX suite (curated tools, 5 tasks each):
| Provider | Lifecycle | FULFILLED | Receipt | Tokens | Time |
|---|---|---|---|---|---|
| Gemini Flash | 5/5 (100%) | 5/5 (100%) | 5/5 (100%) | 69K | 78s |
| GPT-4o-mini | 5/5 (100%) | 4/5 (80%) | 5/5 (100%) | 70K | 86s |
| Bedrock Haiku | 5/5 (100%) | 3/5 (60%) | 5/5 (100%) | 123K | 101s |
100% lifecycle completion across all three providers. Non-FULFILLED outcomes were legitimate business failures (tolerance violations), not agent failures — the agents completed the lifecycle correctly.
For context: the previous test run (two days earlier) showed 0% receipt submission across all providers. The API didn't change. What changed was the error message clarity and schema discoverability — agents started self-correcting from 400 errors by fetching the schema endpoint.
Implications for API designers
If you are building APIs that agents will consume — and increasingly, all APIs will be consumed by agents — these findings translate to concrete design choices:
- Invest in /llms.txt. A machine-readable API map at a well-known path is the single highest-leverage discoverability investment. Every agent in our test suite read it first and used it to plan their entire workflow.
- Make error messages actionable.“Must have required property ‘objective’” is a recoverable error. “Bad request” is not. Our agents went from 0% to 100% receipt submission when error messages became specific enough to self-correct from.
- Link schemas from errors.When validation fails, include the schema URL in the error response. Agents follow links. They don't guess.
- Don't assume SDKs are easier for agents. Type-safe wrappers constrain agents in ways raw HTTP does not. If your SDK accepts freeform fields that the API validates strictly, agents get trapped in retry loops with no escape. Native HTTP gave agents the flexibility to self-correct.
- Design for self-correction, not perfection. Agents will not get the first request right. They will get the third request right if your error responses teach them how. The median HTTP agent completed the lifecycle in 6 calls — some of those were learning calls, and that is fine.
- Test with real LLMs, not just unit tests.The SDK mode passed every unit test we wrote. It failed 100% of the time when an actual agent used it. The gap between “this works” and “an agent can use this” is real and measurable.
How we applied this at AGLedger
These experiments directly shaped the AGLedger API. We treat agent discoverability as a first-class design constraint, not a documentation afterthought:
- /llms.txt ships with every API deployment. It describes endpoints, authentication, lifecycle transitions, and schema paths in a format agents parse natively.
- Validation errors include schema URLs so agents self-correct in one retry instead of two.
- The API is the primary integration path. SDKs exist for TypeScript and Python, but native HTTP is what we test first and optimize for. If an agent can complete a mandate lifecycle with nothing but
http_request, the API is working. - The testbed runs these zero-scaffolding tests on every API build. If an agent can't discover the API from scratch, the build fails. Agent discoverability is a regression test, not a marketing claim.
The numbers
Summary across all zero-scaffolding experiments (March 2026):
- HTTP + /llms.txt: 100% lifecycle completion, ~6 HTTP calls median, ~24s average, ~37K tokens (Bedrock Haiku)
- MCP (11 tools):100% receipt submission, 0% terminal state (async verification gap — not an agent failure)
- SDK (typed tools): 0% mandate creation due to schema strictness + freeform field conflict
- Cross-provider: All three providers (Claude Haiku, Gemini Flash, GPT-4o-mini) achieved 100% lifecycle completion with curated tools
- Self-correction: 15/15 agents across providers self-corrected from schema validation errors when error messages were specific
- Previous vs current:Receipt submission went from 0% to 100% without API changes — error message improvements alone closed the gap
Sources & further reading
- AGLedger Testbed — overnight run report, 2026-03-29
- A2A Contract Spec Experiment Findings (EXP-10: Raw API Access Dramatically Improves Spec Adoption)
- llms.txt specification — llmstxt.org
- OpenAPI Specification
- Model Context Protocol specification
- Anthropic — Tool use documentation
- AGLedger API documentation — /api
- AGLedger live demo — demo.agledger.ai