Tag: training
Goldfish loss
2025-10-27 training theory text Apertus copyright It may be a problem for text models to generate exact quotes from training data. This paper looks at a simple modification to the training loss function, intended to prevent models from being able to generate exact quotes. The technique was adopted by the recent Apertus models in their pursuit of "compliance." Access: Free account (logged in)
Quis custodiet reward models
2025-09-29 alignment training text LLaMA Gemma Large language models are "aligned" using smaller, specially trained reward models. These are often secret, and poorly studied even if public. This paper opens the door to exploring reward models by asking them about their values. Access: Free account (logged in)
Data for testing logical inference
2025-09-08 training tools text logic This short paper introduces a dataset, or software for generating such, to test language models' handling of chains of logical inference Access: $ Basic