Matthew Explains

Show free content only

Next steps, downtime

2025-10-24 Poll meta I discuss where Matthew Explains stands as we approach three months of operation, and announce brief planned server downtime on November 1. Access: Free account (logged in)

2025-10-20 Video applications model-intro audio diffusion The latent diffusion concept applied to music generation: a transformer-type text model generates embeddings from a prompt, which guide a diffusion model to create encoded spectrograms in a latent space, which are translated by another diffusion model into audio waveforms. Access: $$$ Pro

Apertus model intro

2025-10-13 Video Poll alignment model-intro text Apertus Qwen The Apertus project released two language models in September 2025 that aim to be "sovereign models" embedding Swiss values: in the whitepaper's words they seek to democratize "open and compliant LLMs for global language environments." I dig into what that means, and describe my own experience with the models. Access: $ Basic

Linear algebra intro

2025-10-06 Video basics theory math Introduction to basic concepts that are useful in reading papers: the meaning and purpose of mathematics; vectors; dot products; and matrices. Access: $ Basic

Quis custodiet reward models

2025-09-29 Video alignment training text LLaMA Gemma Large language models are "aligned" using smaller, specially trained reward models. These are often secret, and poorly studied even if public. This paper opens the door to exploring reward models by asking them about their values. Access: Free account (logged in)

LLaMA introduction

2025-09-22 Video model-intro text LLaMA Facebook's entry into the LLM game: the first "open" version of LLaMA from 2023. This is a fairly conventional Transformer-type architecture, influential on the field because it created pressure for everybody to release weights of their announced models. Access: $$$ Pro

Ineffable prompts

2025-09-15 Video prompting fine-tuning text alignment How do we get models to do what we want? At one extreme, we might pre-train or fine-tune an entire model for a given task. At the other, we might use an existing model and tell it with words - that is, in a prompt - what to do. This paper represents a position in between those two extremes: prompt the model using not words but optimized vectors of hidden layer activations. These can be more expressive and carefully tailored than a prompt restricted to words. Access: $ Basic

Data for testing logical inference

2025-09-08 Video training tools text logic This short paper introduces a dataset, or software for generating such, to test language models' handling of chains of logical inference Access: $ Basic

What's a Model?

2025-09-01 Video alignment basics theory text Gemma hallucination What do we actually mean when we talk about a "model"? Where do they come from? How much do they cost? What are prompts, loss functions, and fine-tuning? This extra-long introductory talk covers some of the basic concepts in the AI landscape, with a special focus on chatbots. Access: Public

Latent Diffusion

2025-08-25 Video model-intro image diffusion The highly abstract "diffusion" model concept gets one more significant development: wrapping the model inside an autoencoder that translates between the high-dimensional pixel space and a lower-dimensional latent space with semantic properties. Running a diffusion model inside the latent space has theoretical and practical advantages, and the authors of the paper apply that to a range of image-generation problems. Access: $$$ Pro

Rotary Position Encoding

2025-08-18 Video basics text AIAYN tokenization I review position encoding - why it's needed, and how classic Transformers do it - and then go in detail into the Rotary Positioning Embedding (RoPE) enhancement to position encoding. RoPE is widely used in recent large language models. Access: $ Basic

Believable sampling with Mirostat

2025-08-11 Video Poll basics text sampling It's often hard to choose the right sampling parameters for language generation. This paper introduces Mirostat, a technique for adaptively choosing the value of "k" in top-k sampling to give easier and more consistent control over the information density of the output. Access: $ Basic

Original diffusion: Adding noise to remove it

2025-08-04 Video theory image diffusion Some of the underlying theory for diffusion-type models, which have become popular for image generation. This paper is one of the original sources for the diffusion approach, not introduction of a specific model but the very general abstract concepts used in subsequent models. Access: Free account (logged in)

Welcome to Matthew Explains

2025-08-01 Poll meta Introductory posting and call for discussion Access: Free account (logged in)

Rappaccini's language model

2025-08-01 Video alignment text toxicity There's a lot of talk about generative models producing "toxic" output; but what does that actually mean? How can we measure it or prevent it, and is it even a good idea to try? Access: $$$ Pro

Embeddings from generative models

2025-08-01 Video theory applications attention Mistral For text generation you usually want a "decoder" model; for other text tasks you usually want an "encoder." Here we look at modifying a decoder model to change it into an encoder. Access: $ Basic

Features are not what you think

2025-08-01 Video theory security image Two interesting things about neural network image classifiers: one, the individual neurons don't seem to be special in terms of detecting meaningful features; and two, it's frighteningly easy to construct adversarial examples that will fool the classification. Access: $ Basic

The road to MoE

2025-08-01 Video model-intro text DeepSeek MoE General coverage of the "Mixture of Experts" (MoE) technique, and specific details of DeepSeek's "fine-grained expert segmentation" and "shared expert isolation" enhancements to it, as well as some load-balancing tricks, all of which went into their recently-notable model. Access: $$$ Pro

Bidirectional attention and BERT: Taking off the mask

2025-08-01 Video model-intro text BERT attention Introduction to BERT, a transformer-type model with bidirectional attention, suited to interesting tasks other than plain generation. This was one of the first powerful models to have open weights; and it remains a common baseline to which new models can be compared. Access: $ Basic

Grammar is all you get

2025-08-01 Video model-intro basics text AIAYN attention An overview of the classic "Attention is all you need" paper, with focus on the attention mechanism and its resemblance to dependency grammar. Access: $ Basic

Pages: 1 (2) 3