North Coast Synthesis Ltd.

Tag: attention

Cognitive Heads

2026-06-01 Video interpretation CoT attention text The heads in a multi-head attention transformer architecture tend to specialize for different functions. We can find the cognitive heads, responsible for individual steps in chains of thought, by imitating the techniques used in biology to study animal brains. Access: $$$ Pro

Linear attention

2026-04-20 Video AIAYN attention image math theory The simplest kind of attention mechanism for transformers consumes space and computation quadratic in the context window, and that limits how large the context window can be. This paper looks at modifying the attention mechanism to reduce that bound to linear, making much larger windows reasonable. Access: $ Basic

Vision Transformers

2025-12-01 Video AIAYN BERT attention basics image What if we applied the "attention is all you need" architecture to images instead of language? That's the question considered in this paper from 2021, which laid the groundwork for today's multi-modal models. Access: $ Basic

Embeddings from generative models

2025-08-01 Video theory applications attention Mistral For text generation you usually want a "decoder" model; for other text tasks you usually want an "encoder." Here we look at modifying a decoder model to change it into an encoder. Access: $ Basic

Bidirectional attention and BERT: Taking off the mask

2025-08-01 Video model-intro text BERT attention Introduction to BERT, a transformer-type model with bidirectional attention, suited to interesting tasks other than plain generation. This was one of the first powerful models to have open weights; and it remains a common baseline to which new models can be compared. Access: $ Basic

Grammar is all you get

2025-08-01 Video model-intro basics text AIAYN attention An overview of the classic "Attention is all you need" paper, with focus on the attention mechanism and its resemblance to dependency grammar. Access: $ Basic