Tag: toxicity
Rappaccini's language model
2025-08-01 alignment text toxicity There's a lot of talk about generative models producing "toxic" output; but what does that actually mean? How can we measure it or prevent it, and is it even a good idea to try? Access: $$$ Pro