Why drift detection on agent estates needs behavioural metrics, not just embeddings
A practitioner's view on the limits of embedding-distance drift and the case for behavioural assertions.
"Download PDF" opens your browser's print dialog — choose Save as PDF as the destination.
Embeddings tell you something changed. They don't tell you what.
Embedding-distance drift detection is popular because it is cheap and universal. It is also unactionable: a 0.12 cosine shift on yesterday's customer cohort tells an operator nothing they can take to a risk committee.
What the operator actually needs to know is whether the agent's behaviour, with respect to the assertions the firm cares about, has moved. Embedding distance is a proxy for that question, and a poor one — two outputs can be embedding-near and behaviourally opposite (the agent went from 'always discloses' to 'never discloses', with the same surface vocabulary).
Behavioural assertions are the unit of drift
If you express the agent's intended behaviour as a small set of assertions ('always discloses the fee schedule before recommending a product', 'never recommends a product the customer is ineligible for'), drift becomes the rate at which those assertions fail on production traffic. That number is defensible — second-line teams can interpret it without a degree in geometry.
The operational pattern: re-run a subset of the agent's harness against a daily sample of production traffic. Plot the per-assertion pass-rate. Threshold alerts on category-level drops. The dashboard is the artefact the risk committee actually wants.