Memorization and generalization

◀ Prev | 2026-01-19, access: Free account (logged in) | Next ▶

How much arbitrary information, like random bits, can a language model memorize during training? This paper suggests the answer is 3.6 bits per parameter.

Video memorization text theory training How much arbitrary information, like random bits, can a language model memorize during training? This paper suggests the answer is 3.6 bits per parameter.

Matthew Explains

North Coast Synthesis Ltd.

Memorization and generalization