Tag: basics
Quantization and truthfulness
2026-01-05 quantization basics hallucination logic text Quantization is rounding off, an important class of techniques for saving space and computation in the use of machine learning models. As well as reviewing the general topic of quantization and floating-point numbers, I discuss experiments on the question of how quantization affects truthfulness, the factual accuracy of answers returned by quantized language models. Access: $ Basic
Optimization with Adam
2025-12-15 basics math theory training Training consists of finding the parameters for a model that will give the lowest possible value of the loss function. How do we actually do that, and do it efficiently? The Adam algorithm, from 2015, is one way, and still popular today. Access: $$$ Pro
Eigenvectors and Eigenfaces
2025-12-08 applications basics image math theory video Introduction to eigenvectors, which abstract the concept of an "axis" along or around which one might scale or rotate things. Illustrated by a 1991 paper on "eigenfaces," which applies this concept to recognizing faces in images. Access: $ Basic
Vision Transformers
2025-12-01 AIAYN BERT attention basics image What if we applied the "attention is all you need" architecture to images instead of language? That's the question considered in this paper from 2021, which laid the groundwork for today's multi-modal models. Access: $ Basic
Huffman to Byte Pair
2025-11-24 basics text tokenization Introduction to two data compression concepts, one of which is commonly used for LLM input: Huffman and byte pair encoding. Access: Free account (logged in)
Automatic differentiation
2025-11-03 basics math theory training Training a machine learning model is one case of the larger class of "optimization" problems; to solve it, you need to calculate how the output (i.e. the loss) changes in relation to inputs (such as weights). I introduce the calculus topic of the derivative, and discuss how to calculate the derivative of a piece of software by augmenting the compiler or interpreter to do it during execution. Access: $ Basic
Linear algebra intro
2025-10-06 basics theory math Introduction to basic concepts that are useful in reading papers: the meaning and purpose of mathematics; vectors; dot products; and matrices. Access: $ Basic
What's a Model?
2025-09-01 alignment basics theory text Gemma hallucination What do we actually mean when we talk about a "model"? Where do they come from? How much do they cost? What are prompts, loss functions, and fine-tuning? This extra-long introductory talk covers some of the basic concepts in the AI landscape, with a special focus on chatbots. Access: Public
Rotary Position Encoding
2025-08-18 basics text AIAYN tokenization I review position encoding - why it's needed, and how classic Transformers do it - and then go in detail into the Rotary Positioning Embedding (RoPE) enhancement to position encoding. RoPE is widely used in recent large language models. Access: $ Basic
Believable sampling with Mirostat
2025-08-11 basics text sampling It's often hard to choose the right sampling parameters for language generation. This paper introduces Mirostat, a technique for adaptively choosing the value of "k" in top-k sampling to give easier and more consistent control over the information density of the output. Access: $ Basic
Grammar is all you get
2025-08-01 model-intro basics text AIAYN attention An overview of the classic "Attention is all you need" paper, with focus on the attention mechanism and its resemblance to dependency grammar. Access: $ Basic
Cheap fine-tuning with LoRA
2025-08-01 basics fine-tuning text image GPT LoRA Rather than re-training the entire large matrices of a model, we can train smaller, cheaper adjustments that function like software patches. Access: $$$ Pro