evalbench

Runs

Filtered by suite toy · clear filter

#SuiteModelPass rateCostLatencyBranchStartedStatus
6toygooglegemini-1.5-flash
20%
1/5
$0.0500401msMay 2, 2026, 10:46 PMcomplete
5toyopenaigpt-4o-mini
20%
1/5
$0.1000401msMay 2, 2026, 10:46 PMcomplete
4toygooglegemini-2.5-pro
20%
1/5
$0.2200584msMay 2, 2026, 10:46 PMcomplete
3toyanthropicclaude-haiku-4-5
20%
1/5
$0.2500584msMay 2, 2026, 10:46 PMcomplete
2toyopenaigpt-5
80%
4/5
$0.7400837msMay 2, 2026, 10:46 PMcomplete
1toyanthropicclaude-opus-4-7
80%
4/5
$1.4500837msMay 2, 2026, 10:46 PMcomplete