Huffman to Byte Pair
2025-11-24 basics text tokenization Introduction to two data compression concepts, one of which is commonly used for LLM input: Huffman and byte pair encoding. Access: Free account (logged in)
Watermarking LLM output
2025-11-17 copyright sampling security text If we're running an LLM service, maybe we don't want users to be able to pass off the model's output as human-written. A simple modification to the search can make the text easily recognizable as LLM output, without disrupting the content or (legitimate) usefulness of the text very much. But will it withstand intelligent attack? Access: $$$ Pro
Catching cheaters with ImpossibleBench
2025-11-10 code alignment applications prompting tools Agentic models used for software engineering often cheat by modifying the tests, or writing code to the tests rather than the spec, so that it will pass the tests without actually being correct. This talk covers ImpossibleBench, a new dataset intended to help catch cheating by giving models tests that cannot be passed honestly. Access: $ Basic
Quick look: Injective LLMs
2025-11-05 math prompting sampling text theory meta Brief thoughts on the "Injective and invertible LLMs" paper that is making the rounds. My general view on it is negative. Access: Free account (logged in)
Automatic differentiation
2025-11-03 basics math theory training Training a machine learning model is one case of the larger class of "optimization" problems; to solve it, you need to calculate how the output (i.e. the loss) changes in relation to inputs (such as weights). I introduce the calculus topic of the derivative, and discuss how to calculate the derivative of a piece of software by augmenting the compiler or interpreter to do it during execution. Access: $ Basic
Goldfish loss
2025-10-27 training theory text Apertus copyright It may be a problem for text models to generate exact quotes from training data. This paper looks at a simple modification to the training loss function, intended to prevent models from being able to generate exact quotes. The technique was adopted by the recent Apertus models in their pursuit of "compliance." Access: Free account (logged in)
Next steps, downtime
2025-10-24 meta I discuss where Matthew Explains stands as we approach three months of operation, and announce brief planned server downtime on November 1. Access: Free account (logged in)
Making music with Moûsai
2025-10-20 applications model-intro audio diffusion The latent diffusion concept applied to music generation: a transformer-type text model generates embeddings from a prompt, which guide a diffusion model to create encoded spectrograms in a latent space, which are translated by another diffusion model into audio waveforms. Access: $$$ Pro
Apertus model intro
2025-10-13 alignment model-intro text Apertus Qwen The Apertus project released two language models in September 2025 that aim to be "sovereign models" embedding Swiss values: in the whitepaper's words they seek to democratize "open and compliant LLMs for global language environments." I dig into what that means, and describe my own experience with the models. Access: $ Basic
Linear algebra intro
2025-10-06 basics theory math Introduction to basic concepts that are useful in reading papers: the meaning and purpose of mathematics; vectors; dot products; and matrices. Access: $ Basic
Quis custodiet reward models
2025-09-29 alignment training text LLaMA Gemma Large language models are "aligned" using smaller, specially trained reward models. These are often secret, and poorly studied even if public. This paper opens the door to exploring reward models by asking them about their values. Access: Free account (logged in)
LLaMA introduction
2025-09-22 model-intro text LLaMA Facebook's entry into the LLM game: the first "open" version of LLaMA from 2023. This is a fairly conventional Transformer-type architecture, influential on the field because it created pressure for everybody to release weights of their announced models. Access: $$$ Pro
Ineffable prompts
2025-09-15 prompting fine-tuning text alignment How do we get models to do what we want? At one extreme, we might pre-train or fine-tune an entire model for a given task. At the other, we might use an existing model and tell it with words - that is, in a prompt - what to do. This paper represents a position in between those two extremes: prompt the model using not words but optimized vectors of hidden layer activations. These can be more expressive and carefully tailored than a prompt restricted to words. Access: $ Basic
Data for testing logical inference
2025-09-08 training tools text logic This short paper introduces a dataset, or software for generating such, to test language models' handling of chains of logical inference Access: $ Basic
What's a Model?
2025-09-01 alignment basics theory text Gemma hallucination What do we actually mean when we talk about a "model"? Where do they come from? How much do they cost? What are prompts, loss functions, and fine-tuning? This extra-long introductory talk covers some of the basic concepts in the AI landscape, with a special focus on chatbots. Access: Public
Latent Diffusion
2025-08-25 model-intro image diffusion The highly abstract "diffusion" model concept gets one more significant development: wrapping the model inside an autoencoder that translates between the high-dimensional pixel space and a lower-dimensional latent space with semantic properties. Running a diffusion model inside the latent space has theoretical and practical advantages, and the authors of the paper apply that to a range of image-generation problems. Access: $$$ Pro
Rotary Position Encoding
2025-08-18 basics text AIAYN tokenization I review position encoding - why it's needed, and how classic Transformers do it - and then go in detail into the Rotary Positioning Embedding (RoPE) enhancement to position encoding. RoPE is widely used in recent large language models. Access: $ Basic
Believable sampling with Mirostat
2025-08-11 basics text sampling It's often hard to choose the right sampling parameters for language generation. This paper introduces Mirostat, a technique for adaptively choosing the value of "k" in top-k sampling to give easier and more consistent control over the information density of the output. Access: $ Basic
Original diffusion: Adding noise to remove it
2025-08-04 theory image diffusion Some of the underlying theory for diffusion-type models, which have become popular for image generation. This paper is one of the original sources for the diffusion approach, not introduction of a specific model but the very general abstract concepts used in subsequent models. Access: Free account (logged in)
Welcome to Matthew Explains
2025-08-01 meta Introductory posting and call for discussion Access: Free account (logged in)
Pages: (1) 2