Cross-entropy
2026-01-12 basics math theory training Entropy is the negative logarithm of probability, averaged over all outcomes. Cross-entropy is a similar calculation, involving logs of probabilities from one distribution averaged over a different distribution. These concepts form an excuse for reading Claude Shannon's classic paper A Mathematical Theory of Communication; and cross-entropy in particular is the most popular loss function for language model training. Access: $$$ Pro
Quantization and truthfulness
2026-01-05 quantization basics hallucination logic text Quantization is rounding off, an important class of techniques for saving space and computation in the use of machine learning models. As well as reviewing the general topic of quantization and floating-point numbers, I discuss experiments on the question of how quantization affects truthfulness, the factual accuracy of answers returned by quantized language models. Access: $ Basic
Interest check: electronics education
2026-01-02 electronics meta I'm looking for feedback on the possibility of expanding this site's subject coverage to include electronics design education. Access: Public
The BLEU sausage
2025-12-29 translation evaluation text tools Every paper has an "evaluation" table showing how the paper's new idea gives greater numbers than previous work in the same domain; but where do those numbers actually come from? Here we look at BLEU, a classic measurement for evaluating the quality of machine translation. Access: $ Basic
Qwen3 introduction
2025-12-22 Qwen alignment model-intro text Overview and introduction for the Qwen3 family of open-weights language models from AliBaba, introduced in May 2025. Access: Free account (logged in)
Optimization with Adam
2025-12-15 basics math theory training Training consists of finding the parameters for a model that will give the lowest possible value of the loss function. How do we actually do that, and do it efficiently? The Adam algorithm, from 2015, is one way, and still popular today. Access: $$$ Pro
Eigenvectors and Eigenfaces
2025-12-08 applications basics image math theory video Introduction to eigenvectors, which abstract the concept of an "axis" along or around which one might scale or rotate things. Illustrated by a 1991 paper on "eigenfaces," which applies this concept to recognizing faces in images. Access: $ Basic
Vision Transformers
2025-12-01 AIAYN BERT attention basics image What if we applied the "attention is all you need" architecture to images instead of language? That's the question considered in this paper from 2021, which laid the groundwork for today's multi-modal models. Access: $ Basic
Huffman to Byte Pair
2025-11-24 basics text tokenization Introduction to two data compression concepts, one of which is commonly used for LLM input: Huffman and byte pair encoding. Access: Free account (logged in)
Watermarking LLM output
2025-11-17 copyright sampling security text If we're running an LLM service, maybe we don't want users to be able to pass off the model's output as human-written. A simple modification to the search can make the text easily recognizable as LLM output, without disrupting the content or (legitimate) usefulness of the text very much. But will it withstand intelligent attack? Access: $$$ Pro
Catching cheaters with ImpossibleBench
2025-11-10 code alignment applications prompting tools Agentic models used for software engineering often cheat by modifying the tests, or writing code to the tests rather than the spec, so that it will pass the tests without actually being correct. This talk covers ImpossibleBench, a new dataset intended to help catch cheating by giving models tests that cannot be passed honestly. Access: $ Basic
Quick look: Injective LLMs
2025-11-05 math prompting sampling text theory meta Brief thoughts on the "Injective and invertible LLMs" paper that is making the rounds. My general view on it is negative. Access: Free account (logged in)
Automatic differentiation
2025-11-03 basics math theory training Training a machine learning model is one case of the larger class of "optimization" problems; to solve it, you need to calculate how the output (i.e. the loss) changes in relation to inputs (such as weights). I introduce the calculus topic of the derivative, and discuss how to calculate the derivative of a piece of software by augmenting the compiler or interpreter to do it during execution. Access: $ Basic
Goldfish loss
2025-10-27 training theory text Apertus copyright It may be a problem for text models to generate exact quotes from training data. This paper looks at a simple modification to the training loss function, intended to prevent models from being able to generate exact quotes. The technique was adopted by the recent Apertus models in their pursuit of "compliance." Access: Free account (logged in)
Next steps, downtime
2025-10-24 meta I discuss where Matthew Explains stands as we approach three months of operation, and announce brief planned server downtime on November 1. Access: Free account (logged in)
Making music with Moûsai
2025-10-20 applications model-intro audio diffusion The latent diffusion concept applied to music generation: a transformer-type text model generates embeddings from a prompt, which guide a diffusion model to create encoded spectrograms in a latent space, which are translated by another diffusion model into audio waveforms. Access: $$$ Pro
Apertus model intro
2025-10-13 alignment model-intro text Apertus Qwen The Apertus project released two language models in September 2025 that aim to be "sovereign models" embedding Swiss values: in the whitepaper's words they seek to democratize "open and compliant LLMs for global language environments." I dig into what that means, and describe my own experience with the models. Access: $ Basic
Linear algebra intro
2025-10-06 basics theory math Introduction to basic concepts that are useful in reading papers: the meaning and purpose of mathematics; vectors; dot products; and matrices. Access: $ Basic
Quis custodiet reward models
2025-09-29 alignment training text LLaMA Gemma Large language models are "aligned" using smaller, specially trained reward models. These are often secret, and poorly studied even if public. This paper opens the door to exploring reward models by asking them about their values. Access: Free account (logged in)
LLaMA introduction
2025-09-22 model-intro text LLaMA Facebook's entry into the LLM game: the first "open" version of LLaMA from 2023. This is a fairly conventional Transformer-type architecture, influential on the field because it created pressure for everybody to release weights of their announced models. Access: $$$ Pro
Pages: (1) 2