North Coast Synthesis Ltd.

Data for testing logical inference

2025-09-08 Video training tools text logic This short paper introduces a dataset, or software for generating such, to test language models' handling of chains of logical inference Access: $ Basic

What's a Model?

2025-09-01 Video alignment basics theory text Gemma hallucination What do we actually mean when we talk about a "model"? Where do they come from? How much do they cost? What are prompts, loss functions, and fine-tuning? This extra-long introductory talk covers some of the basic concepts in the AI landscape, with a special focus on chatbots. Access: Public

Latent Diffusion

2025-08-25 Video model-intro image diffusion The highly abstract "diffusion" model concept gets one more significant development: wrapping the model inside an autoencoder that translates between the high-dimensional pixel space and a lower-dimensional latent space with semantic properties. Running a diffusion model inside the latent space has theoretical and practical advantages, and the authors of the paper apply that to a range of image-generation problems. Access: $$$ Pro

Rotary Position Encoding

2025-08-18 Video basics text AIAYN tokenization I review position encoding - why it's needed, and how classic Transformers do it - and then go in detail into the Rotary Positioning Embedding (RoPE) enhancement to position encoding. RoPE is widely used in recent large language models. Access: $ Basic

Believable sampling with Mirostat

2025-08-11 Video Poll basics text sampling It's often hard to choose the right sampling parameters for language generation. This paper introduces Mirostat, a technique for adaptively choosing the value of "k" in top-k sampling to give easier and more consistent control over the information density of the output. Access: $ Basic

Original diffusion: Adding noise to remove it

2025-08-04 Video theory image diffusion Some of the underlying theory for diffusion-type models, which have become popular for image generation. This paper is one of the original sources for the diffusion approach, not introduction of a specific model but the very general abstract concepts used in subsequent models. Access: Free account (logged in)

Welcome to Matthew Explains

2025-08-01 Poll meta Introductory posting and call for discussion Access: Free account (logged in)

Rappaccini's language model

2025-08-01 Video alignment text toxicity There's a lot of talk about generative models producing "toxic" output; but what does that actually mean? How can we measure it or prevent it, and is it even a good idea to try? Access: $$$ Pro

Embeddings from generative models

2025-08-01 Video theory applications attention Mistral For text generation you usually want a "decoder" model; for other text tasks you usually want an "encoder." Here we look at modifying a decoder model to change it into an encoder. Access: $ Basic

Features are not what you think

2025-08-01 Video theory security image Two interesting things about neural network image classifiers: one, the individual neurons don't seem to be special in terms of detecting meaningful features; and two, it's frighteningly easy to construct adversarial examples that will fool the classification. Access: $ Basic

The road to MoE

2025-08-01 Video model-intro text DeepSeek MoE General coverage of the "Mixture of Experts" (MoE) technique, and specific details of DeepSeek's "fine-grained expert segmentation" and "shared expert isolation" enhancements to it, as well as some load-balancing tricks, all of which went into their recently-notable model. Access: $$$ Pro

Bidirectional attention and BERT: Taking off the mask

2025-08-01 Video model-intro text BERT attention Introduction to BERT, a transformer-type model with bidirectional attention, suited to interesting tasks other than plain generation. This was one of the first powerful models to have open weights; and it remains a common baseline to which new models can be compared. Access: $ Basic

Grammar is all you get

2025-08-01 Video model-intro basics text AIAYN attention An overview of the classic "Attention is all you need" paper, with focus on the attention mechanism and its resemblance to dependency grammar. Access: $ Basic

Cheap fine-tuning with LoRA

2025-08-01 Video basics training text image GPT LoRA Rather than re-training the entire large matrices of a model, we can train smaller, cheaper adjustments that function like software patches. Access: $$$ Pro

Better (than) tokenization with BLTs

2025-08-01 Video theory text LLaMA tokenization Using "patches" of input bytes, instead of a fixed token list, allows better scalability and improves performance on some tasks that are hard for token-based LLMs. Access: $ Basic

Generate and read: Oh no they didn't

2025-05-21 Video prompting text GPT RAG hallucination What if instead of looking up facts in Wikipedia, you just used a language model to generate fake Wikipedia articles? Access: Public

Dog-whistle GANs

2025-05-21 Video basics training security theory image GAN Generative Adversarial Nets, and their implications for watermarking generated text. Access: Public