Catching cheaters with ImpossibleBench
◀ Prev | 2025-11-10, access: $ Basic | Next ▶
code alignment applications prompting tools Agentic models used for software engineering often cheat by modifying the tests, or writing code to the tests rather than the spec, so that it will pass the tests without actually being correct. This talk covers ImpossibleBench, a new dataset intended to help catch cheating by giving models tests that cannot be passed honestly.
