evalbench
Runs/#1

toy

anthropicclaude-opus-4-7complete
Pass rate
4/5 (80%)
Cost
$1.4500
Avg latency
837ms
Started
May 2, 2026, 10:46 PM
Triggered
api-seed
Prompt template
Answer with a single lowercase word and nothing else. No punctuation, no quotes, no explanation.

Question: __SAMPLE__

Results

PassSrcInputExpectedOutputScoreCostLatency
A
What animal says "meow"?
catcat
100%
$0.2500551ms
A
Capital of France?
parisparis
100%
$0.3100627ms
A
What color is the sky on a clear day?
blueblue
100%
$0.3000789ms
A
What is the opposite of hot?
coldcold
100%
$0.31001.05s
A
How many legs does a spider have? Answer as a written-out number.
eightsix
0%
exact: expected "eight", got "six"
$0.28001.17s