Data for testing logical inference
2025-09-08 training tools text logic This short paper introduces a dataset, or software for generating such, to test language models' handling of chains of logical inference Access: $ Basic
What's a Model?
2025-09-01 alignment basics theory text Gemma hallucination What do we actually mean when we talk about a "model"? Where do they come from? How much do they cost? What are prompts, loss functions, and fine-tuning? This extra-long introductory talk covers some of the basic concepts in the AI landscape, with a special focus on chatbots. Access: Public
Latent Diffusion
2025-08-25 model-intro image diffusion The highly abstract "diffusion" model concept gets one more significant development: wrapping the model inside an autoencoder that translates between the high-dimensional pixel space and a lower-dimensional latent space with semantic properties. Running a diffusion model inside the latent space has theoretical and practical advantages, and the authors of the paper apply that to a range of image-generation problems. Access: $$$ Pro
Rotary Position Encoding
2025-08-18 basics text AIAYN tokenization I review position encoding - why it's needed, and how classic Transformers do it - and then go in detail into the Rotary Positioning Embedding (RoPE) enhancement to position encoding. RoPE is widely used in recent large language models. Access: $ Basic
Believable sampling with Mirostat
2025-08-11 basics text sampling It's often hard to choose the right sampling parameters for language generation. This paper introduces Mirostat, a technique for adaptively choosing the value of "k" in top-k sampling to give easier and more consistent control over the information density of the output. Access: $ Basic
Original diffusion: Adding noise to remove it
2025-08-04 theory image diffusion Some of the underlying theory for diffusion-type models, which have become popular for image generation. This paper is one of the original sources for the diffusion approach, not introduction of a specific model but the very general abstract concepts used in subsequent models. Access: Free account (logged in)
Welcome to Matthew Explains
2025-08-01 meta Introductory posting and call for discussion Access: Free account (logged in)
Rappaccini's language model
2025-08-01 alignment text toxicity There's a lot of talk about generative models producing "toxic" output; but what does that actually mean? How can we measure it or prevent it, and is it even a good idea to try? Access: $$$ Pro
Embeddings from generative models
2025-08-01 theory applications attention Mistral For text generation you usually want a "decoder" model; for other text tasks you usually want an "encoder." Here we look at modifying a decoder model to change it into an encoder. Access: $ Basic
Features are not what you think
2025-08-01 theory security image Two interesting things about neural network image classifiers: one, the individual neurons don't seem to be special in terms of detecting meaningful features; and two, it's frighteningly easy to construct adversarial examples that will fool the classification. Access: $ Basic
The road to MoE
2025-08-01 model-intro text DeepSeek MoE General coverage of the "Mixture of Experts" (MoE) technique, and specific details of DeepSeek's "fine-grained expert segmentation" and "shared expert isolation" enhancements to it, as well as some load-balancing tricks, all of which went into their recently-notable model. Access: $$$ Pro
Bidirectional attention and BERT: Taking off the mask
2025-08-01 model-intro text BERT attention Introduction to BERT, a transformer-type model with bidirectional attention, suited to interesting tasks other than plain generation. This was one of the first powerful models to have open weights; and it remains a common baseline to which new models can be compared. Access: $ Basic
Grammar is all you get
2025-08-01 model-intro basics text AIAYN attention An overview of the classic "Attention is all you need" paper, with focus on the attention mechanism and its resemblance to dependency grammar. Access: $ Basic
Cheap fine-tuning with LoRA
2025-08-01 basics training text image GPT LoRA Rather than re-training the entire large matrices of a model, we can train smaller, cheaper adjustments that function like software patches. Access: $$$ Pro
Better (than) tokenization with BLTs
2025-08-01 theory text LLaMA tokenization Using "patches" of input bytes, instead of a fixed token list, allows better scalability and improves performance on some tasks that are hard for token-based LLMs. Access: $ Basic
Generate and read: Oh no they didn't
2025-05-21 prompting text GPT RAG hallucination What if instead of looking up facts in Wikipedia, you just used a language model to generate fake Wikipedia articles? Access: Public