Acme Learning/
Dashboard
Eval Studio
Track FdemoWrite evals, golden sets, and regression tests for AI agents. Run them, see what breaks.
running
Eval suites
5
active
310 goldens total
Pass rate
90.3%
+1.4
Across all suites
Avg runtime
1m 41s
/run
Cached + parallel
Regressions caught
2
today
Caught before merge
Eval suites
Each suite is a versioned set of goldens. Pass requires all hard rubrics to clear and 0 regressions vs the last green run.
Agents
support-agent / golden / v3119
4
1
38s4m agoRAG
rag-recall / chunk-strategy65
12
3
2m 12s11m agoSafety
prompt-injection / red-team26
5
1
1m 04s1h agoTool Use
tool-call / arg-validation56
0
0
22s2h agoVoice
voice-agent / latency budget14
2
2
4m 30s3h agoRegressions caught today
Cases that were passing on the last green run and are now failing. The grader explains *why*.
rag-recall·long-table-mid-page regressionchunk size raised from 800→1200 broke table boundaries
support-agent·angry-billing-tier-2 regressionescalation threshold drift in v3 prompt