ISO 42001 Certification Evidence: What Auditors Actually Want to See
ISO/IEC 42001:2023 certification is on the horizon for any enterprise running AI agents in production. The standard is clear about what an AI management system requires. What it doesn't spell out is what the auditor's evidence request actually looks like. This post walks through clauses 4–10 with practical examples of what evidence each clause demands, what structured accountability records can generate automatically, and what your organization still needs to provide.
The evidence gap
Most organizations preparing for ISO 42001 start with documentation. Policy documents, risk registers, process narratives. These are necessary — the standard requires them. But when the auditor arrives, documentation is table stakes. What they actually want is evidence that the management system is operating. Not a description of what should happen, but proof of what did happen.
For AI agent operations, this creates a specific challenge. Agents execute across context windows, across providers, across time. The work happens in API calls, tool invocations, and LLM completions. If your evidence strategy depends on someone going back and documenting what the agent did, you're reconstructing — and reconstruction is exactly what auditors are trained to distrust.
The alternative: structure agent work so that the evidence is a byproduct. If agents follow a protocol that captures the commitment before work starts, records the delivery, and documents the acceptance decision, then the audit evidence already exists when the auditor asks for it.
Clause 4 — Context of the organization
What the auditor wants: Evidence that you've identified the boundaries of your AI systems, documented stakeholder needs, and defined the scope of your AI management system. For agent operations, this means showing which agents operate in which domains, what inter-organizational boundaries exist, and how risk levels are classified.
What structured records provide: Every mandate includes risk level classification and domain tags. Federation configurations document inter-organizational AI system boundaries. Custom schemas define structured data contracts per domain. When the auditor asks “show me how you scope your AI operations,” you export the mandate history filtered by domain, risk level, or organizational boundary.
What you still own: The decisions themselves. Determining organizational context, identifying stakeholder needs, defining the management system scope. The infrastructure records and exports these decisions — it doesn't make them for you.
Clause 5 — Leadership
What the auditor wants: Evidence of leadership commitment, defined roles and responsibilities, and policy establishment. In practice, this means showing who authorized what, when authority was designated, and what scope each role carries.
What structured records provide: Role-based access with principal, performer, and accessor roles is recorded per mandate. Authority scope and designation dates are captured in the audit trail. Every verdict traces back to a specific principal with a recorded authorization chain. The auditor can see not just that a decision was made, but who had the authority to make it and when that authority was granted.
What you still own: Leadership commitment statements, policy documents, and the actual assignment of roles. The infrastructure enforces and records the role structure — your leadership defines it.
Clause 6 — Planning
What the auditor wants: Evidence that you plan for risks and opportunities, set AI objectives, and document how you'll achieve them. For agent operations, this means demonstrating that constraints, deadlines, and acceptance criteria exist before work begins — not after.
What structured records provide: The mandate structure itself is planning evidence. Every mandate captures objectives, constraints, deadlines, and tolerance bounds before work starts. Risk fields classify the mandate at creation. The mandate is locked once created — the auditor can verify that planning preceded execution by comparing mandate creation timestamps against receipt submission timestamps.
What you still own: Risk assessment methodology, AI-specific objectives, and planning decisions. The mandate captures your plan in a structured, auditable format. You define what the plan is.
Clause 7 — Support
What the auditor wants: Evidence that you've provisioned adequate resources, ensured competence, established communication channels, and maintained documented information. This is often where auditors check whether your tools and integrations are actually usable by the people (and agents) who need them.
What structured records provide: Integration through SDKs (TypeScript, Python), native API, and MCP demonstrates that agents have the tooling to participate in the management system. Documentation exports in JSON, CSV, and NDJSON show that records are accessible in standard, machine-readable formats. The export capability itself is evidence that documented information is maintained and retrievable.
What you still own: Resource allocation, competence requirements, training programs, and communication strategy. The infrastructure is one resource among many — you determine what your team needs to operate the management system.
Clause 8 — Operation
What the auditor wants: This is the big one. Evidence that your AI operations are planned, implemented, and controlled. The auditor wants to see operational records that demonstrate a consistent, repeatable process with documented state transitions. Not a narrative of what should happen — a record of what did happen, across every operation in scope.
What structured records provide: The mandate/receipt/verdict lifecycle with its 17-state machine is operational evidence by design. Every state change is recorded in an append-only audit vault. The auditor can trace any operation from initial commitment through delivery to acceptance or rejection. Ed25519 signatures and SHA-256 hash chaining ensure records haven't been modified after the fact. This is the clause where structured accountability records provide the most direct value — the operational evidence generates itself as agents do the work.
What you still own: Operational planning decisions, control implementation strategy, and risk treatment execution. The records prove that operations followed a controlled process. You define what that process is and why.
Clause 9 — Performance evaluation
What the auditor wants: Evidence of monitoring, measurement, analysis, and evaluation. Internal audit results. Management review records. The auditor is checking whether you know how well your AI management system is performing — and whether you can prove it with data, not opinions.
What structured records provide: Built-in reputation scoring tracks agent reliability across mandates over time. Cross-mandate compliance attestation records provide measurement data. Drift detection across model updates shows whether agent performance changes when providers update their models. Verdict history (PASS/FAIL rates, timeliness, tolerance compliance) gives the auditor quantitative performance data grounded in structured records, not self-reported metrics.
What you still own: Monitoring program design, internal audit scope and schedule, management review process. The data is there — you determine what to measure, how often, and what the results mean for your management system.
Clause 10 — Improvement
What the auditor wants: Evidence of nonconformity handling and continual improvement. When something goes wrong, can you show what happened, what you did about it, and whether the corrective action worked? The auditor wants a trail from problem to resolution.
What structured records provide: The 3-tier dispute resolution system records the full chain from failure to resolution. Remediation states and revision workflows document corrective actions. Because the original mandate, the failed receipt, the dispute, and the remediated delivery are all linked in the same audit chain, the auditor gets a complete nonconformity record without anyone having to reconstruct it. The full chain is preserved for root cause analysis.
What you still own: Corrective action decisions and continual improvement strategy. The records show what happened and how it was resolved. You decide what to improve and why.
Export formats for auditor consumption
Auditors work with evidence in different formats depending on the certification body, the scope of the audit, and their own tooling. Structured accountability records export in three formats:
- JSON — Full fidelity. Every field, every signature, every hash-chain link. Best for automated analysis and integration with auditor tooling.
- CSV — Tabular view. One row per mandate or per state transition. Auditors who work in spreadsheets can filter, sort, and cross-reference without specialized tools.
- NDJSON — Newline-delimited JSON. One record per line. Efficient for streaming into log analysis platforms or compliance tooling that processes records sequentially.
OCSF v1.4.0 export maps mandate events to standard security event formats, bridging the gap between AI management system evidence and existing security compliance infrastructure.
One set of records, multiple frameworks
ISO 42001 doesn't exist in isolation. If you're preparing for certification, you're almost certainly also tracking requirements from the EU AI Act, NIST AI RMF, or both. The good news: these frameworks overlap significantly. The same structured accountability records that satisfy ISO 42001 clauses also address parallel requirements elsewhere.
ISO 42001 Clause 8 (Operation) maps directly to EU AI Act Article 12 (event logging) and NIST AI RMF MANAGE function (managing AI risks). Clause 9 (Performance evaluation) aligns with NIST MEASURE and EU AI Act Article 9 (risk management). Clause 10 (Improvement) corresponds to EU AI Act Article 20 (corrective actions) and NIST GOVERN subcategories on continual improvement.
The practical implication: if your agents follow a structured accountability protocol, you're not building separate evidence sets for each framework. You're exporting the same underlying records with different filters and mappings. The mandate/receipt/verdict chain is the common evidence layer.
The key insight
ISO 42001 certification is not primarily a documentation exercise. It's an evidence exercise. The documentation (policies, procedures, risk assessments) is necessary but not sufficient. What the auditor ultimately evaluates is whether your management system is operating as documented — and the only way to demonstrate that is with records.
If your AI agents follow a structured accountability protocol, certification evidence is a byproduct of operations. Every mandate is a planning record (Clause 6). Every state transition is an operational record (Clause 8). Every verdict is a performance evaluation data point (Clause 9). Every dispute resolution is an improvement record (Clause 10). The evidence generates itself.
AGLedger provides the infrastructure for generating and exporting this evidence. Your organization provides the management system, the policies, and the decisions. The combination is what the auditor certifies.