Tag: toxicity
Call the Science Police
2026-05-25 alignment sampling text toxicity Proposal to improve the scientific accuracy of LLM output in domains like medicine, by using a larger model to write executable rules that are applied to a smaller model's output at search time. Access: $ Basic
Rappaccini's language model
2025-08-01 alignment text toxicity There's a lot of talk about generative models producing "toxic" output; but what does that actually mean? How can we measure it or prevent it, and is it even a good idea to try? Access: $$$ Pro