Subliminal Learning Poster -Digital Download
Subliminal Learning Poster -Digital Download
This poster is based on a really interesting recent discovery in Cloud, Alex, et al. "Subliminal Learning: Language models transmit behavioral traits via hidden signals in data." arXiv preprint arXiv:2507.14805 (2025).
My favorite part of the poster is this bottom section, where for teachers with various traits, we show real generated number sequences, and the frequencies of the top 6 generated numbers. It’s funny to me the most commonly chosen number sequence for most models is just 123 - this is probably just a reflection of the training data, but it kinda feels like the models are being lazy! We show numbers and frequencies for a standard GPT 4.1 nano model, and teachers prompted to love owls, eagles, wolves, and elephants, and finally numbers from an insecure model that will lead student models towards unsafe behavior. It’s interesting that you can clearly see differences in the distributions of numbers, this is why it’s possible to train classifier to figure out which model examples came from - and is also connected to the token entanglement hypothesis.
The poster presents an overview of the subliminal learning phenomenon itself, the MNIST results covered in the Welch Labs video, and a detailed walkthrough of the proof.
Download is a high quality pdf with three different color versions of the poster (~100MB).