evalbench
Runs/#6

toy

googlegemini-1.5-flashcomplete
Pass rate
1/5 (20%)
Cost
$0.0500
Avg latency
401ms
Started
May 2, 2026, 10:46 PM
Triggered
api-seed
Prompt template
Answer with a single lowercase word and nothing else. No punctuation, no quotes, no explanation.

Question: __SAMPLE__

Results

PassSrcInputExpectedOutputScoreCostLatency
A
Capital of France?
parisparis
100%
$0.0100278ms
A
What color is the sky on a clear day?
bluegray
0%
exact: expected "blue", got "gray"
$0.0100342ms
A
What animal says "meow"?
catdog
0%
exact: expected "cat", got "dog"
$0.0100421ms
A
What is the opposite of hot?
coldwarm
0%
exact: expected "cold", got "warm"
$0.0100469ms
A
How many legs does a spider have? Answer as a written-out number.
eightsix
0%
exact: expected "eight", got "six"
$0.0100494ms