Dog-whistle GANs

2025-05-21, access: Public | Next ▶

Download video (MP4, 183 MiB)
Slides (PDF)
Transcript (txt)
Link to the paper: https://arxiv.org/abs/1406.2661

basics training security theory image GAN Going back here to the original paper on GANs, Goodfellow et al. from 2014. It's a short paper and a simple concept: you train your generative model in tandem with another model (probably quite a simple one) that tries to distinguish the main model's output, from training data. The "discriminator" model is basically being used as a glorified loss function. The paper's point is simply that that works.

In the talk I'm interested in pushing in the direction of watermarking: what does the GAN technique mean for "plagiarism detectors" and similar? Can we build a model specifically with the goal of having it be detectable? Can we play rock-paper-scissors-lizard-Spock with five models trying to fake each other out? And so on.

Access level to read comments: Public
Access level to write comments: Free account (logged in)

2025-09-20 19:35 OwenF

This one sort of threw me for a loop. There's a lot going on for such a short lecture. I love the idea of creating a round-robin of different personalities in different models, but I feel intuitively that no matter how much you iterate this sort of thing, all you're doing is pushing everything back to mimicking the training data itself (essentially computation-expensive loss functions). Need to ponder it further. It also doesn't help that I watched this lecture first, instead of the intro to models, so I came at it with a crippled understanding of how these things work. May need to re-watch it to solidify my understanding of what value (if any) this sort of parallel training could proffer.

As for watermarking, that has always seemed like a fool's errand to me, especially in an open market of different engines (text or otherwise). I'm no crypto punk, but it feels to me like you don't even need to know that there's an encoded schema in your "plagiarized" output, or what it is - you just presume there is one, so you run your output through a handful of different reconstruction engines as input and synthesize the multiple results (either in parallel or sequentially) to develop a "clean" copy that won't be flagged by the discriminator.

Regardless, this all seems very silly to me, as I don't understand the value in being able to claim "x output came from y model" with any certainty. Further, it feels like the value of that capability (and the capability itself) would begin to degrade immediately upon first acknowledgement by the developer (See also: every DRM scheme ever). Like "alignment," this seems contrary to scientific utility. But, as you know, I'm not a big believer in attribution or "IP" in general.

2025-09-21 10:27 Matthew Skala

As I think I said in the talk, "that's why you do the experiment." My intuitive feeling is that if you train only two models to try to defeat each other, then you're right - either one of them will win, or they'll both end up just mimicking the training data without any interesting difference developing between them.

The reason to hope that something more complicated might develop with three or more is that we *do* see such things happening even with very simple systems. For instance, we can have a system of three six-sided dice with carefully selected numbers on the sides, such that die A is more than 50% likely to roll higher than die B; die B is more than 50% likely to roll higher than die C; and die C is more than 50% likely to roll higher than die A. (Search "non-transitive dice" for more on this situation.) If such a simple system can display this kind of behaviour then I think it's reasonable to hope language models could do the same.

Watermarking, well, as you may know, I wanted to determine theoretical limits on that as a big part of my PhD thesis and ended up not being able to come up with much solid. My intuitive feeling there is that yes, it's unlikely that it can really work well; but knowing exactly to what extent it can or cannot work in theory, is still an interesting question, with only limited results from others found in the time since when I was looking at it.

2025-09-25 20:40 OwenF

Starting to get more from this having watched the Grammar and BLT lectures. Understanding technical model sculpting engine design considerations better. Would there be value in creating a flow series that uses this round-robin setup but has some BLT heads, some calssic tokenization, some grammar tree based (if that's possible)? I know this is organicist, but differnt people have different personalities because they have different firmware in their wetware (and even different wetware) so they quantize input and autogenerate output using uniqe models that map statistically on many multivariable axies. Still, part of me feels that even a complex confederacy of different engines all backchecking each will fail to replicate cognition. Massive matrix manipulation, even if that matrix is filled with probabilites determined based on entropic evaluation read-ahead, is still just ventriloquism.

As for watermarking, maybe there's a way to encode BLT-recognizable fingerprints in text produced by a non-BLT model by either adding new training on some corpus that is trained on a BLT watermarking head as per the model in this paper, or even making many individual and unique watermarking schemes by training a bunch of different LoRA using BLT heads and different "cyphertexts" that provide differnt specific and identifiable entropies. If a business, say, had a bunch of these different LoRAs, maybe they could bolt one or more to a given classic model LLM and then use an analysis model of some kind to seek out simmilar entropy frequency and density in the patch lenght. Idk, it still feels like even then a) you would need a lot of text to show any corelation, and b) that corelation would be statistical rather than probative.

Not sure if that makes sense. Also, I think Transformers might warrant a tag of their own. Important concept, and you say in the Giayg video that you will likely be addressing them again through the lens of that paper alone, let alone mentioning them in many of the others.

2025-10-02 18:30 Matthew Skala

I think if I had a tag for all Transformer-type models I'd be using it so often that it would just become a list of almost all text models. I do have the AIAYN tag, which I'm using for postings specifically related to the core Transformer concept.

On having a model that combines different input front-ends, like a tokenizer, BLT, and maybe also something grammar-based or some other way of turning input into vectors: I think there is some of that kind of thing being done by current multi-modal models (i.e. the ones that can take audio, text, or pictures, as inputs to the same model). It's not trivial to work in BLT in particular, though, because of the aspect of not knowing how often to actually run the main model. Classic Transformers run the main model exactly once per token, and are tied closely to that; the BLT concept is all about running it fewer times and using the tokenizer-like thing to decide. There may be a way of bringing those things together, but especially if you want to actually do both in parallel (maybe using them both as "experts" in an MoE situation), it'll be tricky keeping them synchronized in a sensible way.

Matthew Explains

North Coast Synthesis Ltd.

Dog-whistle GANs