North Coast Synthesis Ltd.

Catching cheaters with ImpossibleBench

◀ Prev | 2025-11-10, access: $ Basic | Next ▶

Agentic models used for software engineering often cheat by modifying the tests, or writing code to the tests rather than the spec, so that it will pass the tests without actually being correct.  This talk covers ImpossibleBench, a new dataset intended to help catch cheating by giving models tests that cannot be passed honestly.

Video code alignment applications prompting tools Agentic models used for software engineering often cheat by modifying the tests, or writing code to the tests rather than the spec, so that it will pass the tests without actually being correct. This talk covers ImpossibleBench, a new dataset intended to help catch cheating by giving models tests that cannot be passed honestly.