Memorization and generalization
◀ Prev | 2026-01-19, access: Free account (logged in)
memorization text theory training How much arbitrary information, like random bits, can a language model memorize during training? This paper suggests the answer is 3.6 bits per parameter.
