evalbench
Runs/#5

toy

openaigpt-4o-minicomplete
Pass rate
1/5 (20%)
Cost
$0.1000
Avg latency
401ms
Started
May 2, 2026, 10:46 PM
Triggered
api-seed
Prompt template
Answer with a single lowercase word and nothing else. No punctuation, no quotes, no explanation.

Question: __SAMPLE__

Results

PassSrcInputExpectedOutputScoreCostLatency
A
Capital of France?
parisparis
100%
$0.0200278ms
A
What color is the sky on a clear day?
bluegray
0%
exact: expected "blue", got "gray"
$0.0200342ms
A
What animal says "meow"?
catdog
0%
exact: expected "cat", got "dog"
$0.0200421ms
A
What is the opposite of hot?
coldwarm
0%
exact: expected "cold", got "warm"
$0.0200469ms
A
How many legs does a spider have? Answer as a written-out number.
eightsix
0%
exact: expected "eight", got "six"
$0.0200494ms