Tag: security
Invading privacy with LLM MIA
2026-03-09 copyright security text training Membership inference attacks attempt to determine whether a given item was, or was not, in the training data of a model. There is a lot of work on these attacks in the context of database records, but rather less on language models; and there's an important question of whether such attacks work on language models at all. Access: $$$ Pro
Watermarking LLM output
2025-11-17 copyright sampling security text If we're running an LLM service, maybe we don't want users to be able to pass off the model's output as human-written. A simple modification to the search can make the text easily recognizable as LLM output, without disrupting the content or (legitimate) usefulness of the text very much. But will it withstand intelligent attack? Access: $$$ Pro
Features are not what you think
2025-08-01 theory security image Two interesting things about neural network image classifiers: one, the individual neurons don't seem to be special in terms of detecting meaningful features; and two, it's frighteningly easy to construct adversarial examples that will fool the classification. Access: $ Basic