AI ROI & Metrics — how to measure real value
Most AI projects fail on measurement, not tech. Here's what to track to know if it's actually working and to justify scaling the budget.
The biggest cause of death for AI projects: unmeasurable value. Executive excitement drives the POC, then 6 months later there's no data to justify the next budget, so the project stalls. Measure from day one.
**4 metric families** to track for any AI project: (1) **Quality** (does it work? — accuracy, F1, CSAT on AI answers), (2) **Efficiency** (does it save time/money? — minutes saved per task, cost per query, deflection rate), (3) **Adoption** (do people use it? — DAU, % eligible tasks actually using AI, time-to-engagement), (4) **Risk** (what's going wrong? — rate, incident count, complaint volume).
ROI calculation pattern: [value created (time saved × hourly rate + errors avoided × cost per error + new revenue)] − [cost (API calls + infra + engineering time + change management)]. Do the math upfront (hypothesis) and at 3 months (measured). If hypothesis and measurement diverge 3×, you have a learning — not a failure, a calibration.
Typical value benchmarks (for sizing the business case): customer support deflection 20-40% of volume = hundreds of k€/year at mid-size. Internal Q&A (HR/IT/legal bots) 10-15 min saved × 1000 queries/week = 1-2 FTE saved. Document extraction 80% automation × 10k docs/month = 2-5 FTE. Sales email drafting: 30% faster = ~5h/week/sales rep.
Watch out — the traps: (i) vanity metrics ('the bot answered 10k questions' — without CSAT, useless); (ii) displaced time ('I saved 10 min' but that time went to scrolling, not to higher value); (iii) hidden costs (engineering maintenance, prompt iteration, eval set upkeep — typically 20-30% of initial build); (iv) sample bias (the N=20 who love it ≠ the 500 eligible users).
The leading indicator of success: user adoption at 90 days. If < 30% of eligible people use it regularly, technology is not the problem — change management is. Rework UX + onboarding + integrations into existing tools (Slack, CRM, email).
Grounded on https://www.anthropic.com/customers
Next up
AI Risks — hallucinations, prompt injection, privacy
The three big classes of risk on production AI systems, and the practical mitigations that actually work.