National Cyber Warfare Foundation (NCWF)

National Cyber Warfare Foundation (NCWF)

[Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans stil

0 user ratings

2025-06-17 10:06:26
milo
Developers
- archive --

Rohan Paul / @rohanpaul_ai:

[Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel — This is really BAD news of LLM's coding skill. ☹️ The best Frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel. LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI ("International [image]

Rohan Paul / @rohanpaul_ai:

[Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel — This is really BAD news of LLM's coding skill. ☹️ The best Frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel. LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI ("International [image]

Source: TechMeme
Source Link: http://www.techmeme.com/250617/p7#a250617p7

Comments	new comment
Nobody has commented yet. Will you be the first?

Forum

Copyright 2012 through 2026 - National Cyber Warfare Foundation - All rights reserved worldwide.