evalbench
Runs/#2

toy

openaigpt-5complete
Pass rate
4/5 (80%)
Cost
$0.7400
Avg latency
837ms
Started
May 2, 2026, 10:46 PM
Triggered
api-seed
Prompt template
Answer with a single lowercase word and nothing else. No punctuation, no quotes, no explanation.

Question: __SAMPLE__

Results

PassSrcInputExpectedOutputScoreCostLatency
A
What animal says "meow"?
catcat
100%
$0.1200551ms
A
Capital of France?
parisparis
100%
$0.1400627ms
A
What color is the sky on a clear day?
blueblue
100%
$0.1700789ms
A
What is the opposite of hot?
coldcold
100%
$0.17001.05s
A
How many legs does a spider have? Answer as a written-out number.
eightsix
0%
exact: expected "eight", got "six"
$0.14001.17s