Linear attention

◀ Prev | 2026-04-20, access: $ Basic | Next ▶

The simplest kind of attention mechanism for transformers consumes space and computation quadratic in the context window, and that limits how large the context window can be. This paper looks at modifying the attention mechanism to reduce that bound to linear, making much larger windows reasonable.

Video AIAYN attention image math theory The simplest kind of attention mechanism for transformers consumes space and computation quadratic in the context window, and that limits how large the context window can be. This paper looks at modifying the attention mechanism to reduce that bound to linear, making much larger windows reasonable.

Matthew Explains

North Coast Synthesis Ltd.

Linear attention