North Coast Synthesis Ltd.

Memorization and generalization

◀ Prev | 2026-01-19, access: Free account (logged in)

How much arbitrary information, like random bits, can a language model memorize during training?  This paper suggests the answer is 3.6 bits per parameter.

Video memorization text theory training How much arbitrary information, like random bits, can a language model memorize during training? This paper suggests the answer is 3.6 bits per parameter.