Tag: memorization
Memorization and generalization
2026-01-19 memorization text theory training How much arbitrary information, like random bits, can a language model memorize during training? This paper suggests the answer is 3.6 bits per parameter. Access: Free account (logged in)
Goldfish loss
2025-10-27 training theory text Apertus copyright memorization It may be a problem for text models to generate exact quotes from training data. This paper looks at a simple modification to the training loss function, intended to prevent models from being able to generate exact quotes. The technique was adopted by the recent Apertus models in their pursuit of "compliance." Access: Free account (logged in)