Tag: attention
Vision Transformers
2025-12-01 AIAYN BERT attention basics image What if we applied the "attention is all you need" architecture to images instead of language? That's the question considered in this paper from 2021, which laid the groundwork for today's multi-modal models. Access: $ Basic
Embeddings from generative models
2025-08-01 theory applications attention Mistral For text generation you usually want a "decoder" model; for other text tasks you usually want an "encoder." Here we look at modifying a decoder model to change it into an encoder. Access: $ Basic
Bidirectional attention and BERT: Taking off the mask
2025-08-01 model-intro text BERT attention Introduction to BERT, a transformer-type model with bidirectional attention, suited to interesting tasks other than plain generation. This was one of the first powerful models to have open weights; and it remains a common baseline to which new models can be compared. Access: $ Basic
Grammar is all you get
2025-08-01 model-intro basics text AIAYN attention An overview of the classic "Attention is all you need" paper, with focus on the attention mechanism and its resemblance to dependency grammar. Access: $ Basic