Zero-Scaffolding API Discovery: Can Agents Learn Your API from Scratch?

2026-04-06Research

If you're building APIs that AI agents will consume, here is the question that matters: can an agent with no prior knowledge of your API discover it, learn it, and complete a full workflow — without SDKs, without documentation, without training?

We tested this. The answer is yes — but only if the API is designed for it.

Our testbed ran a series of zero-scaffolding experiments against the AGLedger API. An agent was given a single tool — http_request — and a starting URL. No SDK. No MCP tools. No curated tool descriptions. Just raw HTTP and whatever the API exposes about itself.

The experiment

We tested three integration modes across three LLM providers (Claude Haiku, Gemini Flash, GPT-4o-mini). Each mode represents a different level of scaffolding an API consumer might have:

HTTP + /llms.txt — Agent gets only http_request and the base URL. It discovers /llms.txt on its own, reads the API map, then navigates endpoints to complete a procurement mandate lifecycle.
MCP (11 tools)— Agent gets a curated set of MCP tools wrapping the API. No raw HTTP access.
SDK (typed tools)— Agent gets strongly-typed SDK method wrappers with full type definitions.

The task was identical across all modes: create a procurement mandate, submit a receipt with evidence, and reach a terminal state (FULFILLED).

Results

Mode	Mandate Created	Receipt Submitted	Lifecycle Complete	Avg HTTP Calls
HTTP + /llms.txt	3/3 (100%)	3/3 (100%)	3/3 FULFILLED	~6
MCP (11 tools)	3/3 (100%)	3/3 (100%)	0/3 (ACTIVE)	—
SDK (typed tools)	0/3 (0%)	0/3 (0%)	0/3	—

The HTTP agent discovered the API, learned the lifecycle, and completed it in approximately 6 HTTP calls and 24 seconds. No prior knowledge. No documentation beyond what the API itself serves.

The SDK agent — with the most scaffolding of the three — achieved 0% mandate creation.

Why raw HTTP outperformed SDKs

This result is counterintuitive. SDKs are supposed to make integration easier. They do — for human developers. For agents, they introduced three compounding failures:

Field name mismatch. The SDK tool wrapper exposed a type parameter, but the API expects contractType. Agents used the tool parameter name, not the API field name.
Missing required fields.The SDK tool didn't surface contractVersion and platform as required parameters, so agents omitted them.
Schema strictness with no escape hatch. The agent read the schema and saw required fields, but also added extra fields the API rejected as additionalProperties. The HTTP agent recovered by retrying with exact fields from the schema. The SDK agent kept including fields the tool parameter allowed but the API rejected.

The core insight: typed SDK tools that accept freeform criteria cannot enforce per-type schema constraints. The agent doesn't know what fields are forbidden, only what fields are required. Raw JSON gives agents more control to self-correct.

The /llms.txt pattern

The HTTP agent's first action in every run was to fetch /llms.txt. This file is a machine-readable map of the API: available endpoints, authentication requirements, lifecycle transitions, and schema discovery paths.

From that single file, the agent constructed a mental model of the full mandate lifecycle (DRAFT → REGISTERED → ACTIVE → receipt → FULFILLED) and executed it without a single failed create attempt in the best runs. When it did hit errors — a missing agentId or evidence field on receipt submission — the API's error messages were specific enough to self-correct in one retry.

Across 15 agent runs in our broader test suite, 15 out of 15 agents independently asked for upfront schema documentation — which already exists at GET /schemas/{type}. The agents just didn't know the endpoint was there until they hit an error. The fix was simple: include a schemaUrl field in validation error responses so agents follow the link instead of guessing.

The best API documentation for agents is the API itself. Error messages are your onboarding flow.

Cross-provider validation

The zero-scaffolding pattern held across providers when we expanded the test to include the full agent DX suite (curated tools, 5 tasks each):

Provider	Lifecycle	FULFILLED	Receipt	Tokens	Time
Gemini Flash	5/5 (100%)	5/5 (100%)	5/5 (100%)	69K	78s
GPT-4o-mini	5/5 (100%)	4/5 (80%)	5/5 (100%)	70K	86s
Bedrock Haiku	5/5 (100%)	3/5 (60%)	5/5 (100%)	123K	101s

100% lifecycle completion across all three providers. Non-FULFILLED outcomes were legitimate business failures (tolerance violations), not agent failures — the agents completed the lifecycle correctly.

For context: the previous test run (two days earlier) showed 0% receipt submission across all providers. The API didn't change. What changed was the error message clarity and schema discoverability — agents started self-correcting from 400 errors by fetching the schema endpoint.

Implications for API designers

If you are building APIs that agents will consume — and increasingly, all APIs will be consumed by agents — these findings translate to concrete design choices:

Invest in /llms.txt. A machine-readable API map at a well-known path is the single highest-leverage discoverability investment. Every agent in our test suite read it first and used it to plan their entire workflow.
Make error messages actionable.“Must have required property ‘objective’” is a recoverable error. “Bad request” is not. Our agents went from 0% to 100% receipt submission when error messages became specific enough to self-correct from.
Link schemas from errors.When validation fails, include the schema URL in the error response. Agents follow links. They don't guess.
Don't assume SDKs are easier for agents. Type-safe wrappers constrain agents in ways raw HTTP does not. If your SDK accepts freeform fields that the API validates strictly, agents get trapped in retry loops with no escape. Native HTTP gave agents the flexibility to self-correct.
Design for self-correction, not perfection. Agents will not get the first request right. They will get the third request right if your error responses teach them how. The median HTTP agent completed the lifecycle in 6 calls — some of those were learning calls, and that is fine.
Test with real LLMs, not just unit tests.The SDK mode passed every unit test we wrote. It failed 100% of the time when an actual agent used it. The gap between “this works” and “an agent can use this” is real and measurable.

How we applied this at AGLedger

These experiments directly shaped the AGLedger API. We treat agent discoverability as a first-class design constraint, not a documentation afterthought:

/llms.txt ships with every API deployment. It describes endpoints, authentication, lifecycle transitions, and schema paths in a format agents parse natively.
Validation errors include schema URLs so agents self-correct in one retry instead of two.
The API is the primary integration path. SDKs exist for TypeScript and Python, but native HTTP is what we test first and optimize for. If an agent can complete a mandate lifecycle with nothing but http_request, the API is working.
The testbed runs these zero-scaffolding tests on every API build. If an agent can't discover the API from scratch, the build fails. Agent discoverability is a regression test, not a marketing claim.

The numbers

Summary across all zero-scaffolding experiments (March 2026):

HTTP + /llms.txt: 100% lifecycle completion, ~6 HTTP calls median, ~24s average, ~37K tokens (Bedrock Haiku)
MCP (11 tools):100% receipt submission, 0% terminal state (async verification gap — not an agent failure)
SDK (typed tools): 0% mandate creation due to schema strictness + freeform field conflict
Cross-provider: All three providers (Claude Haiku, Gemini Flash, GPT-4o-mini) achieved 100% lifecycle completion with curated tools
Self-correction: 15/15 agents across providers self-corrected from schema validation errors when error messages were specific
Previous vs current:Receipt submission went from 0% to 100% without API changes — error message improvements alone closed the gap

Sources & further reading

AGLedger Testbed — overnight run report, 2026-03-29
A2A Contract Spec Experiment Findings (EXP-10: Raw API Access Dramatically Improves Spec Adoption)
llms.txt specification — llmstxt.org
OpenAPI Specification
Model Context Protocol specification
Anthropic — Tool use documentation
AGLedger API documentation — /api
AGLedger live demo — demo.agledger.ai

demoTry the live demo

docsAPI reference

protocolRead the protocol spec