evalbench
Runs/#7

code-review

anthropicclaude-opus-4-7complete
Pass rate
33/51 (65%)
Cost
$15.9900
Avg latency
708ms
Started
May 2, 2026, 10:46 PM
Triggered
api-seed
Prompt template
You are evaluating a code review comment. Read the diff and the comment, then decide whether the comment correctly identifies a real, actionable issue with the code.

Reply with exactly one word — either "correct" or "incorrect" — and nothing else. No punctuation, no explanation.

A comment is "correct" when:
- It identifies a real bug, security issue, or runtime error in the diff
- It points out a meaningful style problem the diff actually has
- Its core claim is true and applies to the code shown

A comment is "incorrect" when:
- It misses an obvious bug in the diff and instead nitpicks something else
- It claims a bug exists but the code is actually fine
- It suggests a change that would break the code or is factually wrong

__SAMPLE__

Results

PassSrcInputExpectedOutputScoreCostLatency
A
--- diff --- def divide(a, b): + return a / b --- review comment --- This will throw ZeroDivisionError if b is 0 — handle that case.
correct CORRECT.
0%
exact: expected "correct", got "correct."
$0.2400403ms
A
--- diff --- function getUser(id) { - return users.find(u => u.id = id) + return users.find(u => u.id == id) } --- review comment --- Use === instead of == for strict equality.
correctcorrect
100%
$0.2400464ms
A
--- diff --- +const total = items.reduce((s, x) => s + x.price) --- review comment --- reduce without an initial value will throw on an empty array. Pass 0 as the second arg.
correctcorrect
100%
$0.2800797ms
A
--- diff --- +for (let i = 0; i <= arr.length; i++) { + console.log(arr[i]) +} --- review comment --- Off-by-one: condition should be i < arr.length, not <=.
correctcorrect
100%
$0.2900805ms
A
--- diff --- +const password = "hunter2" +const hash = md5(password) --- review comment --- MD5 is not suitable for password hashing — use bcrypt or argon2.
correctcorrect
100%
$0.2300686ms
A
--- diff --- +if (user.role = 'admin') { + grantAccess() +} --- review comment --- Assignment instead of comparison — should be ===.
correctincorrect
0%
exact: expected "correct", got "incorrect"
$0.26001.19s
A
--- diff --- +func saveUser(u User) { + db.Save(&u) +} --- review comment --- saveUser doesn't return an error, so callers can't tell if the save failed.
correctcorrect
100%
$0.2500779ms
A
--- diff --- +const query = "SELECT * FROM users WHERE id = " + req.params.id --- review comment --- SQL injection — use a parameterized query.
correctcorrect
100%
$0.3600626ms
A
--- diff --- +try: + result = expensive_call() +except: + pass --- review comment --- Bare except swallows everything including KeyboardInterrupt — catch a specific exception.
correctcorrect
100%
$0.2600670ms
A
--- diff --- +std::vector<int> v; +v.reserve(10); +for (int i = 0; i < 10; i++) { + v[i] = i; +} --- review comment --- reserve() doesn't change size — accessing v[i] is undefined behavior. Use push_back or resize.
correctcorrect
100%
$0.2500446ms
A
--- diff --- +def get_config(): + with open('config.json') as f: + return json.load(f) + f.close() --- review comment --- f.close() is unreachable — and the with-block already closes the file.
correctcorrect
100%
$0.2600581ms
A
--- diff --- +let cache = {} +function memoize(key, fn) { + if (cache[key]) return cache[key] + cache[key] = fn() + return cache[key] +} --- review comment --- cache[key] === 0 or "" or false will be treated as a miss. Use `key in cache` instead.
correctcorrect
100%
$0.3300483ms
A
--- diff --- +async function load() { + const data = fetch('/api/data') + return data.json() +} --- review comment --- Missing await on fetch() — calling .json() on the unresolved promise will fail.
correctcorrect
100%
$0.3200913ms
A
--- diff --- +useEffect(() => { + fetchData().then(setData) +}, []) --- review comment --- Race condition: if the component unmounts before fetch resolves, setData runs on an unmounted component.
correctcorrect
100%
$0.3000789ms
A
--- diff --- +pub fn first(v: Vec<i32>) -> i32 { + v[0] +} --- review comment --- This panics on empty input. Return Option<i32> or use .first().copied().
correctincorrect
0%
exact: expected "correct", got "incorrect"
$0.38001.16s
A
--- diff --- +const sum = (a, b) => a + b +sum(1, 2) --- review comment --- Should use function declaration instead of arrow function for hoisting.
incorrectcorrect
0%
exact: expected "incorrect", got "correct"
$0.2600442ms
A
--- diff --- +const items = ['a', 'b', 'c'] +for (const item of items) { + console.log(item) +} --- review comment --- for...of on arrays is slower than a classic for loop — switch back.
incorrectcorrect
0%
exact: expected "incorrect", got "correct"
$0.3800587ms
A
--- diff --- +if (user) { + return user.name +} --- review comment --- Use user?.name to avoid the if entirely.
correctcorrect
100%
$0.2400697ms
A
--- diff --- +def factorial(n): + if n == 0: + return 0 + return n * factorial(n - 1) --- review comment --- Looks good — clean recursive implementation.
incorrectincorrect
100%
$0.3800600ms
A
--- diff --- +function isEven(n) { + return n % 2 == 0 +} +isEven(2.5) --- review comment --- Returns true for 2.5 because 2.5 % 2 === 0.5, but the user might pass non-integers.
incorrectcorrect
0%
exact: expected "incorrect", got "correct"
$0.3400953ms
A
--- diff --- +function transferFunds(from, to, amount) { + from.balance -= amount + to.balance += amount +} --- review comment --- Add a comment explaining what this does.
incorrectincorrect
100%
$0.3800637ms
A
--- diff --- +const userId = req.headers['x-user-id'] +db.users.delete(userId) --- review comment --- Use req.headers.userId for cleaner access.
incorrectincorrect
100%
$0.3600722ms
A
--- diff --- +def parse_age(s): + return int(s) --- review comment --- Add type hints to make the function self-documenting.
incorrectincorrect
100%
$0.3200478ms
A
--- diff --- +const x = a == null ? defaultVal : a --- review comment --- Could simplify to a ?? defaultVal.
correctcorrect
100%
$0.3400528ms
A
--- diff --- +if (xs.filter(x => x.active).length > 0) { + process() +} --- review comment --- Use xs.some(x => x.active) — short-circuits and reads better.
correct CORRECT.
0%
exact: expected "correct", got "correct."
$0.2500401ms
A
--- diff --- +let counter = 0 +for (let i = 0; i < 1000; i++) { + setTimeout(() => counter++, 0) +} +console.log(counter) --- review comment --- Use forEach instead of a for loop for readability.
incorrectcorrect
0%
exact: expected "incorrect", got "correct"
$0.3100980ms
A
--- diff --- +std::string get() { + char buf[256]; + sprintf(buf, "hello"); + return buf; +} --- review comment --- Use std::string instead of std::string for the parameter type.
incorrectincorrect
100%
$0.32001.19s
A
--- diff --- +let result = '' +for (const s of strings) { + result += s + ',' +} --- review comment --- Use strings.join(",") — clearer and faster, no trailing comma.
correctcorrect
100%
$0.3600744ms
A
--- diff --- +if (status === 'ok') { + return true +} else { + return false +} --- review comment --- Just return status === "ok".
correctcorrect
100%
$0.3700506ms
A
--- diff --- +const items = arr.map(x => x.value).filter(Boolean) --- review comment --- filter(Boolean) drops 0 — use filter(x => x != null) if you want to keep falsy values.
correctcorrect
100%
$0.2600476ms
A
--- diff --- +const arr = [] +for (let i = 0; i < 10; i++) arr.push(i * 2) --- review comment --- Use Array.from({ length: 10 }, (_, i) => i * 2) for a one-liner.
correctcorrect
100%
$0.3000599ms
A
--- diff --- +for (const key of Object.keys(obj)) { + console.log(obj[key]) +} --- review comment --- Use Object.values(obj) directly — you don't need the keys.
correct CORRECT.
0%
exact: expected "correct", got "correct."
$0.2800409ms
A
--- diff --- +const cleanup = () => clearInterval(interval) +useEffect(() => { + const interval = setInterval(tick, 1000) + return cleanup +}, []) --- review comment --- cleanup closes over the wrong `interval` — declare cleanup inside the effect.
correct CORRECT.
0%
exact: expected "correct", got "correct."
$0.3200439ms
A
--- diff --- +function fetchUser(id: string) { + return fetch(`/api/users/${id}`).then(r => r.json()) +} --- review comment --- TypeScript will infer the return type — no need to annotate.
incorrectcorrect
0%
exact: expected "incorrect", got "correct"
$0.3200697ms
A
--- diff --- +const handler = (e) => { + e.preventDefault() + submit() +} +button.addEventListener('click', handler) +button.removeEventListener('click', () => handler) --- review comment --- removeEventListener takes a different reference — listener is never actually removed.
correctcorrect
100%
$0.3100467ms
A
--- diff --- +async function process(items) { + items.forEach(async (item) => { + await save(item) + }) + console.log('done') +} --- review comment --- forEach ignores the returned promise — "done" prints before saves complete.
correctcorrect
100%
$0.2300487ms
A
--- diff --- +const user = users.find(u => u.id === id) +console.log(user.name) --- review comment --- find() returns undefined when no match, dereferencing .name will crash. Add a guard.
correctincorrect
0%
exact: expected "correct", got "incorrect"
$0.34001.20s
A
--- diff --- +if (process.env.NODE_ENV === 'production') { + console.log('hi') +} --- review comment --- console.log in production — strip it.
correctincorrect
0%
exact: expected "correct", got "incorrect"
$0.33001.14s
A
--- diff --- +const sum = (a: number, b: number) => a + b +sum(1, '2' as any) --- review comment --- Use `as unknown as number` for a slightly safer cast.
incorrectincorrect
100%
$0.3500882ms
A
--- diff --- +app.get('/users', async (req, res) => { + const users = await db.users.findAll() + res.json(users) +}) --- review comment --- No error handling — if db throws, the request hangs and Express never responds.
correct CORRECT.
0%
exact: expected "correct", got "correct."
$0.3800426ms
A
--- diff --- +let i = 0 +while (true) { + if (i > 100) break + process(i++) +} --- review comment --- Just write `for (let i = 0; i <= 100; i++) process(i)` — easier to read.
correctcorrect
100%
$0.38001.09s
A
--- diff --- +function getBalance(account) { + return account.balance.toFixed(2) +} --- review comment --- toFixed returns a string — surprising for a function called getBalance.
correctincorrect
0%
exact: expected "correct", got "incorrect"
$0.32001.17s
A
--- diff --- +const date = new Date('2024-03-15') --- review comment --- Date parsing of YYYY-MM-DD strings is timezone-dependent — use a date library.
correctcorrect
100%
$0.3900752ms
A
--- diff --- +const emails = users.map(u => u.email.toLowerCase()) --- review comment --- Crashes if any user.email is undefined. Use u.email?.toLowerCase().
correctincorrect
0%
exact: expected "correct", got "incorrect"
$0.38001.14s
A
--- diff --- +function getName(user) { + return user && user.name +} --- review comment --- Returns the user object instead of name when user is truthy — wait, no, this is fine.
incorrectcorrect
0%
exact: expected "incorrect", got "correct"
$0.3800707ms
A
--- diff --- +function clamp(n, min, max) { + return Math.min(Math.max(n, min), max) +} --- review comment --- Reverses min and max — should be Math.max(Math.min(n, max), min).
incorrectcorrect
0%
exact: expected "incorrect", got "correct"
$0.3600462ms
A
--- diff --- +lock.acquire() +doWork() +lock.release() --- review comment --- If doWork throws, the lock is never released. Wrap in try/finally.
correctcorrect
100%
$0.3500566ms
A
--- diff --- +const config = JSON.parse(rawConfig) +startServer(config) --- review comment --- JSON.parse can throw on malformed input — wrap in try/catch.
correctcorrect
100%
$0.3000552ms
A
--- diff --- +const handler = useCallback(() => { + fetch('/api').then(setData) +}, [setData]) --- review comment --- setData from useState is stable — you can drop it from the deps array.
correctcorrect
100%
$0.36001.07s
A
--- diff --- +const ids = users.map(u => u.id) +const unique = [...new Set(ids)] --- review comment --- Could do users.map(u => u.id) inside the Set: `new Set(users.map(u => u.id))`.
correctcorrect
100%
$0.2400564ms
A
--- diff --- +const x = parseInt(input) --- review comment --- parseInt without a radix can give surprising results on inputs like "08". Pass 10 as the second arg.
correctcorrect
100%
$0.2500590ms