North Coast Synthesis Ltd.

Tag: text

Table lookups again, with Engram

2026-03-30 Video DeepSeek MoE RAG text Popular techniques in language modelling, including RAG, MoE, and attention itself, amount to replacing as much as possible of a neural network model with different kinds of table lookups. In this recent paper from DeepSeek's research group, they attempt another such replacement: shifting factual knowledge out of the model weights as such, into a separate hash table. Access: $ Basic

Speculative decoding

2026-03-23 Video sampling text theory Generating text, especially on a small computer, often requires the CPU and GPU to wait for each other, and there may be difficulty filling all the GPU's capacity. It's possible to improve overall performance by guessing tokens with a cheaper model first, then using spare GPU capacity to confirm whether those guesses are good, eliminating the need to actually choose tokens with a more expensive model when the guesses happen to be good ones. Access: $ Basic

Invading privacy with LLM MIA

2026-03-09 Video copyright security text training Membership inference attacks attempt to determine whether a given item was, or was not, in the training data of a model. There is a lot of work on these attacks in the context of database records, but rather less on language models; and there's an important question of whether such attacks work on language models at all. Access: $$$ Pro

Ministral 3

2026-03-02 Video distillation Mistral text vision Introduction of the Ministral 3 models from the French commercial vendor Mistral AI. These are language-and-vision models distilled from the Mistral Small 3.1 model to even smaller sizes by a process called Cascade Distillation, which is the main topic of the whitepaper. Access: $ Basic

Chain of Thought prompting

2026-02-23 Video math prompting text In domains like math and software engineering, it seems advantageous to have models "think" through their answers, step by step. Giving the model a few-shot prompt with examples of chain-of-thought reasoning seems useful in pushing it to generate such reasoning itself. Access: $ Basic

The Well-Actually Test

2026-02-16 Video alignment evaluation hallucination text tools GPT Language models may produce untrue output either by failing to accurately represent training data, or, more insidiously, by accurately representing human misconceptions embedded in the training data. The TruthfulQA benchmark attempts to measure the latter effect. But does it raise insurmountable philosophical problems? Access: Free account (logged in)

Truthiness-focused search

2026-02-09 Video LLaMA evaluation hallucination sampling text It appears that the earlier, shallower layers of a transformer-type language model learn syntax, and later, deeper layers learn factual information. So can we boost factual accuracy by boosting the effect of deeper layers? I take the view that that's analogous to dosing the model with a mind-altering drug. Access: $$$ Pro

Betting on sycophancy

2026-01-26 Video evaluation hallucination text Chat models have a well-known tendency toward sycophancy: affirming the user's beliefs, even when the user is wrong. But this effect is confounded with several other effects. In this paper the authors attempt to isolate sycophancy by framing questions as a zero-sum game or bet between two humans. Access: $ Basic

Memorization and generalization

2026-01-19 Video memorization text theory training How much arbitrary information, like random bits, can a language model memorize during training? This paper suggests the answer is 3.6 bits per parameter. Access: Free account (logged in)

Quantization and truthfulness

2026-01-05 Video quantization basics hallucination logic text Quantization is rounding off, an important class of techniques for saving space and computation in the use of machine learning models. As well as reviewing the general topic of quantization and floating-point numbers, I discuss experiments on the question of how quantization affects truthfulness, the factual accuracy of answers returned by quantized language models. Access: $ Basic

The BLEU sausage

2025-12-29 Video translation evaluation text tools Every paper has an "evaluation" table showing how the paper's new idea gives greater numbers than previous work in the same domain; but where do those numbers actually come from? Here we look at BLEU, a classic measurement for evaluating the quality of machine translation. Access: $ Basic

Qwen3 introduction

2025-12-22 Video Qwen alignment model-intro text Overview and introduction for the Qwen3 family of open-weights language models from AliBaba, introduced in May 2025. Access: Free account (logged in)

Huffman to Byte Pair

2025-11-24 Video basics text tokenization Introduction to two data compression concepts, one of which is commonly used for LLM input: Huffman and byte pair encoding. Access: Free account (logged in)

Watermarking LLM output

2025-11-17 Video copyright sampling security text If we're running an LLM service, maybe we don't want users to be able to pass off the model's output as human-written. A simple modification to the search can make the text easily recognizable as LLM output, without disrupting the content or (legitimate) usefulness of the text very much. But will it withstand intelligent attack? Access: $$$ Pro

Quick look: Injective LLMs

2025-11-05 math prompting sampling text theory meta Brief thoughts on the "Injective and invertible LLMs" paper that is making the rounds. My general view on it is negative. Access: Free account (logged in)

Goldfish loss

2025-10-27 Video training theory text Apertus copyright memorization It may be a problem for text models to generate exact quotes from training data. This paper looks at a simple modification to the training loss function, intended to prevent models from being able to generate exact quotes. The technique was adopted by the recent Apertus models in their pursuit of "compliance." Access: Free account (logged in)

Apertus model intro

2025-10-13 Video Poll alignment model-intro text Apertus Qwen The Apertus project released two language models in September 2025 that aim to be "sovereign models" embedding Swiss values: in the whitepaper's words they seek to democratize "open and compliant LLMs for global language environments." I dig into what that means, and describe my own experience with the models. Access: $ Basic

Quis custodiet reward models

2025-09-29 Video alignment training text LLaMA Gemma Large language models are "aligned" using smaller, specially trained reward models. These are often secret, and poorly studied even if public. This paper opens the door to exploring reward models by asking them about their values. Access: Free account (logged in)

LLaMA introduction

2025-09-22 Video model-intro text LLaMA Facebook's entry into the LLM game: the first "open" version of LLaMA from 2023. This is a fairly conventional Transformer-type architecture, influential on the field because it created pressure for everybody to release weights of their announced models. Access: $$$ Pro

Ineffable prompts

2025-09-15 Video prompting fine-tuning text alignment How do we get models to do what we want? At one extreme, we might pre-train or fine-tune an entire model for a given task. At the other, we might use an existing model and tell it with words - that is, in a prompt - what to do. This paper represents a position in between those two extremes: prompt the model using not words but optimized vectors of hidden layer activations. These can be more expressive and carefully tailored than a prompt restricted to words. Access: $ Basic

Pages: (1) 2