Suites/toy
Smoke test — 5 trivially-answerable cases for the exact scorer
Pass rate over time
Latest by model
| Model | Pass rate | Cost / case | Latency | Runs |
|---|---|---|---|---|
| anthropicclaude-opus-4-7 | 80% | $0.2900 | 837ms | 1 |
| openaigpt-5 | 80% | $0.1500 | 837ms | 1 |
| anthropicclaude-haiku-4-5 | 20% | $0.0500 | 584ms | 1 |
| googlegemini-1.5-flash | 20% | $0.0100 | 401ms | 1 |
| googlegemini-2.5-pro | 20% | $0.0400 | 584ms | 1 |
| openaigpt-4o-mini | 20% | $0.0200 | 401ms | 1 |
Recent runs
| # | Model | Pass rate | Started | Status |
|---|---|---|---|---|
| 6 | googlegemini-1.5-flash | 20% | May 2, 2026, 10:46 PM | complete |
| 5 | openaigpt-4o-mini | 20% | May 2, 2026, 10:46 PM | complete |
| 4 | googlegemini-2.5-pro | 20% | May 2, 2026, 10:46 PM | complete |
| 3 | anthropicclaude-haiku-4-5 | 20% | May 2, 2026, 10:46 PM | complete |
| 2 | openaigpt-5 | 80% | May 2, 2026, 10:46 PM | complete |
| 1 | anthropicclaude-opus-4-7 | 80% | May 2, 2026, 10:46 PM | complete |