Tag: text

Show free content only

Goldfish loss

2025-10-27 Video training theory text Apertus copyright It may be a problem for text models to generate exact quotes from training data. This paper looks at a simple modification to the training loss function, intended to prevent models from being able to generate exact quotes. The technique was adopted by the recent Apertus models in their pursuit of "compliance." Access: Free account (logged in)

Apertus model intro

2025-10-13 Video Poll alignment model-intro text Apertus Qwen The Apertus project released two language models in September 2025 that aim to be "sovereign models" embedding Swiss values: in the whitepaper's words they seek to democratize "open and compliant LLMs for global language environments." I dig into what that means, and describe my own experience with the models. Access: $ Basic

Quis custodiet reward models

2025-09-29 Video alignment training text LLaMA Gemma Large language models are "aligned" using smaller, specially trained reward models. These are often secret, and poorly studied even if public. This paper opens the door to exploring reward models by asking them about their values. Access: Free account (logged in)

LLaMA introduction

2025-09-22 Video model-intro text LLaMA Facebook's entry into the LLM game: the first "open" version of LLaMA from 2023. This is a fairly conventional Transformer-type architecture, influential on the field because it created pressure for everybody to release weights of their announced models. Access: $$$ Pro

Ineffable prompts

2025-09-15 Video prompting fine-tuning text alignment How do we get models to do what we want? At one extreme, we might pre-train or fine-tune an entire model for a given task. At the other, we might use an existing model and tell it with words - that is, in a prompt - what to do. This paper represents a position in between those two extremes: prompt the model using not words but optimized vectors of hidden layer activations. These can be more expressive and carefully tailored than a prompt restricted to words. Access: $ Basic

Data for testing logical inference

2025-09-08 Video training tools text logic This short paper introduces a dataset, or software for generating such, to test language models' handling of chains of logical inference Access: $ Basic

What's a Model?

2025-09-01 Video alignment basics theory text Gemma hallucination What do we actually mean when we talk about a "model"? Where do they come from? How much do they cost? What are prompts, loss functions, and fine-tuning? This extra-long introductory talk covers some of the basic concepts in the AI landscape, with a special focus on chatbots. Access: Public

Rotary Position Encoding

2025-08-18 Video basics text AIAYN tokenization I review position encoding - why it's needed, and how classic Transformers do it - and then go in detail into the Rotary Positioning Embedding (RoPE) enhancement to position encoding. RoPE is widely used in recent large language models. Access: $ Basic

Believable sampling with Mirostat

2025-08-11 Video Poll basics text sampling It's often hard to choose the right sampling parameters for language generation. This paper introduces Mirostat, a technique for adaptively choosing the value of "k" in top-k sampling to give easier and more consistent control over the information density of the output. Access: $ Basic

Rappaccini's language model

2025-08-01 Video alignment text toxicity There's a lot of talk about generative models producing "toxic" output; but what does that actually mean? How can we measure it or prevent it, and is it even a good idea to try? Access: $$$ Pro

The road to MoE

2025-08-01 Video model-intro text DeepSeek MoE General coverage of the "Mixture of Experts" (MoE) technique, and specific details of DeepSeek's "fine-grained expert segmentation" and "shared expert isolation" enhancements to it, as well as some load-balancing tricks, all of which went into their recently-notable model. Access: $$$ Pro

Bidirectional attention and BERT: Taking off the mask

2025-08-01 Video model-intro text BERT attention Introduction to BERT, a transformer-type model with bidirectional attention, suited to interesting tasks other than plain generation. This was one of the first powerful models to have open weights; and it remains a common baseline to which new models can be compared. Access: $ Basic

Matthew Explains

North Coast Synthesis Ltd.