Tag: theory

Show free content only

Clustering with k-means

2026-02-02 Video basics math quantization theory The k-means method finds clusters of related items in a collection of Euclidean vectors. It's a simple building-block algorithm that works well in practice. Access: $ Basic

Memorization and generalization

2026-01-19 Video memorization text theory training How much arbitrary information, like random bits, can a language model memorize during training? This paper suggests the answer is 3.6 bits per parameter. Access: Free account (logged in)

Cross-entropy

2026-01-12 Video basics math theory training Entropy is the negative logarithm of probability, averaged over all outcomes. Cross-entropy is a similar calculation, involving logs of probabilities from one distribution averaged over a different distribution. These concepts form an excuse for reading Claude Shannon's classic paper A Mathematical Theory of Communication; and cross-entropy in particular is the most popular loss function for language model training. Access: $$$ Pro

Optimization with Adam

2025-12-15 Video basics math theory training Training consists of finding the parameters for a model that will give the lowest possible value of the loss function. How do we actually do that, and do it efficiently? The Adam algorithm, from 2015, is one way, and still popular today. Access: $$$ Pro

Eigenvectors and Eigenfaces

2025-12-08 Video applications basics image math theory video Introduction to eigenvectors, which abstract the concept of an "axis" along or around which one might scale or rotate things. Illustrated by a 1991 paper on "eigenfaces," which applies this concept to recognizing faces in images. Access: $ Basic

Quick look: Injective LLMs

2025-11-05 math prompting sampling text theory meta Brief thoughts on the "Injective and invertible LLMs" paper that is making the rounds. My general view on it is negative. Access: Free account (logged in)

Automatic differentiation

2025-11-03 Video basics math theory training Training a machine learning model is one case of the larger class of "optimization" problems; to solve it, you need to calculate how the output (i.e. the loss) changes in relation to inputs (such as weights). I introduce the calculus topic of the derivative, and discuss how to calculate the derivative of a piece of software by augmenting the compiler or interpreter to do it during execution. Access: $ Basic

Goldfish loss

2025-10-27 Video training theory text Apertus copyright memorization It may be a problem for text models to generate exact quotes from training data. This paper looks at a simple modification to the training loss function, intended to prevent models from being able to generate exact quotes. The technique was adopted by the recent Apertus models in their pursuit of "compliance." Access: Free account (logged in)

Linear algebra intro

2025-10-06 Video basics theory math Introduction to basic concepts that are useful in reading papers: the meaning and purpose of mathematics; vectors; dot products; and matrices. Access: $ Basic

What's a Model?

2025-09-01 Video alignment basics theory text Gemma hallucination What do we actually mean when we talk about a "model"? Where do they come from? How much do they cost? What are prompts, loss functions, and fine-tuning? This extra-long introductory talk covers some of the basic concepts in the AI landscape, with a special focus on chatbots. Access: Public

Original diffusion: Adding noise to remove it

2025-08-04 Video theory image diffusion Some of the underlying theory for diffusion-type models, which have become popular for image generation. This paper is one of the original sources for the diffusion approach, not introduction of a specific model but the very general abstract concepts used in subsequent models. Access: Free account (logged in)

Embeddings from generative models

2025-08-01 Video theory applications attention Mistral For text generation you usually want a "decoder" model; for other text tasks you usually want an "encoder." Here we look at modifying a decoder model to change it into an encoder. Access: $ Basic

Features are not what you think

2025-08-01 Video theory security image Two interesting things about neural network image classifiers: one, the individual neurons don't seem to be special in terms of detecting meaningful features; and two, it's frighteningly easy to construct adversarial examples that will fool the classification. Access: $ Basic

Better (than) tokenization with BLTs

2025-08-01 Video theory text LLaMA tokenization Using "patches" of input bytes, instead of a fixed token list, allows better scalability and improves performance on some tasks that are hard for token-based LLMs. Access: $ Basic

Dog-whistle GANs

2025-05-21 Video basics training security theory image GAN Generative Adversarial Nets, and their implications for watermarking generated text. Access: Public

Matthew Explains

North Coast Synthesis Ltd.