North Coast Synthesis Ltd.

Tag: tools

The BLEU sausage

2025-12-29 Video translation evaluation text tools Every paper has an "evaluation" table showing how the paper's new idea gives greater numbers than previous work in the same domain; but where do those numbers actually come from? Here we look at BLEU, a classic measurement for evaluating the quality of machine translation. Access: $ Basic

Catching cheaters with ImpossibleBench

2025-11-10 Video code alignment applications prompting tools Agentic models used for software engineering often cheat by modifying the tests, or writing code to the tests rather than the spec, so that it will pass the tests without actually being correct. This talk covers ImpossibleBench, a new dataset intended to help catch cheating by giving models tests that cannot be passed honestly. Access: $ Basic

Data for testing logical inference

2025-09-08 Video training tools text logic This short paper introduces a dataset, or software for generating such, to test language models' handling of chains of logical inference Access: $ Basic