Matthew Explains

Show free content only

Huffman to Byte Pair

2025-11-24 Video basics text tokenization Introduction to two data compression concepts, one of which is commonly used for LLM input: Huffman and byte pair encoding. Access: Free account (logged in)

2025-11-17 Video copyright sampling security text If we're running an LLM service, maybe we don't want users to be able to pass off the model's output as human-written. A simple modification to the search can make the text easily recognizable as LLM output, without disrupting the content or (legitimate) usefulness of the text very much. But will it withstand intelligent attack? Access: $$$ Pro

Catching cheaters with ImpossibleBench

2025-11-10 Video code alignment applications prompting tools Agentic models used for software engineering often cheat by modifying the tests, or writing code to the tests rather than the spec, so that it will pass the tests without actually being correct. This talk covers ImpossibleBench, a new dataset intended to help catch cheating by giving models tests that cannot be passed honestly. Access: $ Basic

Quick look: Injective LLMs

2025-11-05 math prompting sampling text theory meta Brief thoughts on the "Injective and invertible LLMs" paper that is making the rounds. My general view on it is negative. Access: Free account (logged in)

Automatic differentiation

2025-11-03 Video basics math theory training Training a machine learning model is one case of the larger class of "optimization" problems; to solve it, you need to calculate how the output (i.e. the loss) changes in relation to inputs (such as weights). I introduce the calculus topic of the derivative, and discuss how to calculate the derivative of a piece of software by augmenting the compiler or interpreter to do it during execution. Access: $ Basic

Goldfish loss

2025-10-27 Video training theory text Apertus copyright It may be a problem for text models to generate exact quotes from training data. This paper looks at a simple modification to the training loss function, intended to prevent models from being able to generate exact quotes. The technique was adopted by the recent Apertus models in their pursuit of "compliance." Access: Free account (logged in)

Next steps, downtime

2025-10-24 Poll meta I discuss where Matthew Explains stands as we approach three months of operation, and announce brief planned server downtime on November 1. Access: Free account (logged in)

Making music with Moûsai

2025-10-20 Video applications model-intro audio diffusion The latent diffusion concept applied to music generation: a transformer-type text model generates embeddings from a prompt, which guide a diffusion model to create encoded spectrograms in a latent space, which are translated by another diffusion model into audio waveforms. Access: $$$ Pro

Apertus model intro

2025-10-13 Video Poll alignment model-intro text Apertus Qwen The Apertus project released two language models in September 2025 that aim to be "sovereign models" embedding Swiss values: in the whitepaper's words they seek to democratize "open and compliant LLMs for global language environments." I dig into what that means, and describe my own experience with the models. Access: $ Basic

Linear algebra intro

2025-10-06 Video basics theory math Introduction to basic concepts that are useful in reading papers: the meaning and purpose of mathematics; vectors; dot products; and matrices. Access: $ Basic

Quis custodiet reward models

2025-09-29 Video alignment training text LLaMA Gemma Large language models are "aligned" using smaller, specially trained reward models. These are often secret, and poorly studied even if public. This paper opens the door to exploring reward models by asking them about their values. Access: Free account (logged in)

LLaMA introduction

2025-09-22 Video model-intro text LLaMA Facebook's entry into the LLM game: the first "open" version of LLaMA from 2023. This is a fairly conventional Transformer-type architecture, influential on the field because it created pressure for everybody to release weights of their announced models. Access: $$$ Pro

Ineffable prompts

2025-09-15 Video prompting fine-tuning text alignment How do we get models to do what we want? At one extreme, we might pre-train or fine-tune an entire model for a given task. At the other, we might use an existing model and tell it with words - that is, in a prompt - what to do. This paper represents a position in between those two extremes: prompt the model using not words but optimized vectors of hidden layer activations. These can be more expressive and carefully tailored than a prompt restricted to words. Access: $ Basic

Data for testing logical inference

2025-09-08 Video training tools text logic This short paper introduces a dataset, or software for generating such, to test language models' handling of chains of logical inference Access: $ Basic

What's a Model?

2025-09-01 Video alignment basics theory text Gemma hallucination What do we actually mean when we talk about a "model"? Where do they come from? How much do they cost? What are prompts, loss functions, and fine-tuning? This extra-long introductory talk covers some of the basic concepts in the AI landscape, with a special focus on chatbots. Access: Public

Latent Diffusion

2025-08-25 Video model-intro image diffusion The highly abstract "diffusion" model concept gets one more significant development: wrapping the model inside an autoencoder that translates between the high-dimensional pixel space and a lower-dimensional latent space with semantic properties. Running a diffusion model inside the latent space has theoretical and practical advantages, and the authors of the paper apply that to a range of image-generation problems. Access: $$$ Pro

Rotary Position Encoding

2025-08-18 Video basics text AIAYN tokenization I review position encoding - why it's needed, and how classic Transformers do it - and then go in detail into the Rotary Positioning Embedding (RoPE) enhancement to position encoding. RoPE is widely used in recent large language models. Access: $ Basic

Believable sampling with Mirostat

2025-08-11 Video Poll basics text sampling It's often hard to choose the right sampling parameters for language generation. This paper introduces Mirostat, a technique for adaptively choosing the value of "k" in top-k sampling to give easier and more consistent control over the information density of the output. Access: $ Basic

Original diffusion: Adding noise to remove it

2025-08-04 Video theory image diffusion Some of the underlying theory for diffusion-type models, which have become popular for image generation. This paper is one of the original sources for the diffusion approach, not introduction of a specific model but the very general abstract concepts used in subsequent models. Access: Free account (logged in)

Welcome to Matthew Explains

2025-08-01 Poll meta Introductory posting and call for discussion Access: Free account (logged in)

Pages: (1) 2