Evidence patterns for the UK AI Action Plan
Five reusable operational evidence patterns that satisfy the UK AI Action Plan's accountability expectations — for second-line, governance and risk-committee reporting.
"Download PDF" opens your browser's print dialog — choose Save as PDF as the destination.
Contents
- What the UK AI Action Plan actually asks for
- Why operational evidence beats policy attestation
- The five evidence patterns
- Pattern 1 — Provenance
- Pattern 2 — Decision rationale
- Pattern 3 — Outcome
- Pattern 4 — Human review
- Pattern 5 — Drift
- Composition: from patterns to second-line packs
- Mapping to existing controls
What the UK AI Action Plan actually asks for
The UK AI Action Plan deliberately avoids prescribing a single control catalogue. Instead, it sets expectations around transparency, contestability, accountability and the demonstrable safety of high-impact systems. For firms operating AI agents in regulated customer journeys, that wording is liberating and confusing in equal measure: liberating because it lets you choose how you demonstrate compliance; confusing because nobody hands you a checklist.
This paper argues that the practical answer is operational evidence — structured records produced at agent runtime that, taken together, satisfy every accountability expectation the Plan articulates. We propose five reusable patterns that compose into second-line reporting and, when needed, into a regulator-facing pack.
Why operational evidence beats policy attestation
A policy attestation says: 'we have a policy that says agents must do X'. An operational record says: 'on Tuesday at 14:32, this agent did X in response to this customer, and here is the proof'. Regulators, increasingly, want the second kind of evidence. So do internal risk committees.
Operational evidence has three properties policy attestation lacks. It is contemporaneous (produced at the time of the decision, not reconstructed after the fact). It is granular (per-decision, not per-policy). And it is verifiable (the record references the underlying system state, not a human's recollection).
The five evidence patterns
Across financial services, digital health and public sector engagements, we have converged on five patterns that, between them, satisfy the Action Plan's expectations without bespoke per-regulator instrumentation. Each pattern is a structured record produced by the agent runtime, captured into the AgentAudit audit trail, and exposed to second-line teams as a queryable feed.
- Provenance — where the agent's instruction came from and who approved it
- Decision rationale — the agent's recorded reasoning at decision time
- Outcome — the customer-visible result and any downstream effect
- Human review — which decisions were reviewed, by whom, with what conclusion
- Drift — behavioural drift vs the certified baseline, with thresholds
Pattern 1 — Provenance
Provenance answers: where did this agent's behaviour come from? It records the instruction graph (system prompt, tool definitions, retrieval index version), the configuration version, and the approval chain that authorised each. When the agent does something a customer questions, provenance lets the firm reconstruct what the agent was 'told' to do at that moment.
Provenance is the foundation pattern; the other four sit on top of it. A decision rationale without provenance is unfalsifiable — you cannot tell whether the agent reasoned correctly given its instructions if you cannot recover what its instructions were.
Pattern 2 — Decision rationale
Decision rationale captures the agent's recorded reasoning at the moment a customer-facing decision is made. It is not 'whatever the model said in its chain of thought' — that is rarely reliable. It is a structured, model-graded rationale produced against a fixed schema: 'inputs considered, options weighed, decision selected, policy basis cited'.
Rationale capture imposes a small latency tax. We have found that customers tolerate it for high-stakes decisions (loan eligibility, clinical triage) and reject it for low-stakes ones (information retrieval). Apply selectively.
Pattern 3 — Outcome
Outcome records the customer-visible result of the agent's decision and any downstream effect: the response shown, the tool calls made, the records written, the human downstream-process triggered. This is the pattern that connects 'the agent decided' to 'the customer experienced'.
For Consumer Duty in financial services, outcome capture is the operational backbone of good-outcomes evidencing. For MHRA post-market surveillance in digital health, it is the backbone of adverse-event signal generation.
Pattern 4 — Human review
Human review records which agent decisions were reviewed by a human, who the reviewer was, what their conclusion was, and what action followed. It also records the inverse: where a sampling policy required review and none was performed — a gap to be closed.
We see firms confuse 'human in the loop' with 'human review'. The former is a workflow choice; the latter is an evidence record. A workflow with a human approval step that does not produce a review record produces no evidence.
Pattern 5 — Drift
Drift records the behavioural distance between today's agent and the certified baseline. It is the time-series companion to evaluation: where evaluation says 'the agent passes the harness now', drift says 'and here is how its behaviour has moved since certification'.
Drift is the pattern second-line teams use to ask the most useful question in this domain: 'should we be re-certifying?'. A drift threshold breach triggers re-evaluation, not deployment rollback — drift detection without re-evaluation produces noise.
Composition: from patterns to second-line packs
Individually, each pattern is a record. Composed, they form a second-line pack: 'for this agent, in this sub-period, against this framework, here are the provenance, rationale, outcome, review and drift records, with cross-references'. AgentAudit composes these packs deterministically; second-line teams configure the cross-references once per agent and reuse them across reporting cycles.
Mapping to existing controls
The patterns map cleanly to existing control frameworks: ISO/IEC 42001 management-system controls, NIST AI RMF functions, the ICO AI Auditing Framework checklist, and the EU AI Act Article 12 logging requirements for high-risk systems. We publish the mapping tables alongside the platform; this paper covers the operational patterns, not the framework crosswalks.
Takeaway
Operational evidence — produced at runtime, structured, verifiable — beats policy attestation. Five patterns compose into every regulator-facing pack we have shipped.