evalbench

Runs

Filtered by suite code-review · clear filter

#SuiteModelPass rateCostLatencyBranchStartedStatus
12code-reviewgooglegemini-1.5-flash
43%
22/51
$0.5100306msMay 2, 2026, 10:46 PMcomplete
11code-reviewopenaigpt-4o-mini
43%
22/51
$1.0200306msMay 2, 2026, 10:46 PMcomplete
10code-reviewgooglegemini-2.5-pro
71%
36/51
$2.0200497msMay 2, 2026, 10:46 PMcomplete
9code-reviewanthropicclaude-haiku-4-5
71%
36/51
$2.5200497msMay 2, 2026, 10:46 PMcomplete
8code-reviewopenaigpt-5
65%
33/51
$7.0200708msMay 2, 2026, 10:46 PMcomplete
7code-reviewanthropicclaude-opus-4-7
65%
33/51
$15.9900708msMay 2, 2026, 10:46 PMcomplete