Betting on sycophancy
◀ Prev | 2026-01-26, access: $ Basic
evaluation hallucination text Chat models have a well-known tendency toward sycophancy: affirming the user's beliefs, even when the user is wrong; but this effect is confounded with several other effects. In this paper the authors attempt to isolate sycophancy by framing questions as a zero-sum game or bet between two humans.
