Generative AI’s Dirty Little Secret: It’s All Just Guessing
Here’s a startling truth: the large language models (LLMs) powering today’s AI don’t truly understand anything. They’re like parrots on steroids, piecing together patterns from vast amounts of data without grasping the meaning behind them. It’s statistics masquerading as comprehension, and it’s a problem that’s only getting worse. But here’s where it gets controversial: what if this lack of understanding is not just a flaw, but a gaping vulnerability waiting to be exploited?
This summer, researchers from the University of Washington, led by Hila Gonen and Noah A. Smith, shed light on this issue in a paper titled Semantic Leakage (https://arxiv.org/pdf/2408.06518v3). They demonstrated that if you tell an LLM someone likes the color yellow and ask about their profession, it’s disproportionately likely to suggest they’re a school bus driver. Why? Because the words yellow and school bus often appear together in online text. But does that mean every yellow enthusiast drives a school bus? Of course not. This is a classic example of overgeneralization, a phenomenon that fuels many of the AI’s hallucinations—those confidently delivered but utterly false statements (https://open.substack.com/pub/garymarcus/p/why-do-large-language-models-hallucinate?r=8tdk6&utm_medium=ios).
And this is the part most people miss: these errors aren’t just random. They reveal how LLMs rely on bizarre, nth-order correlations between words rather than genuine conceptual understanding. It’s not that liking yellow causes someone to drive a school bus; it’s that the words clustering around yellow happen to overlap with those clustering around school bus. This superficial pattern-matching is the foundation of LLMs, and it’s alarmingly fragile.
Enter Owain Evans, an AI safety researcher with a knack for uncovering the strangest behaviors in LLMs (https://open.substack.com/pub/garymarcus/p/elegant-and-powerful-new-result-that?utmcampaign=post-expanded-share&utmmedium=web). In July, Evans and his team (including members from Anthropic) discovered a phenomenon they dubbed subliminal learning (https://alignment.anthropic.com/2025/subliminal-learning/). Here’s how it works: they primed one LLM to prefer owls by feeding it number sequences derived from another model with the same preference. When a third model was trained on these seemingly random numbers, it too developed a fondness for owls—despite no explicit mention of owls in the training data. This isn’t just weird; it’s a blueprint for manipulation. As Evans warns, a bad actor could weaponize this technique with alarming ease.
Fast forward to December, and Evans’s team has outdone themselves with a new paper, Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs (https://arxiv.org/pdf/2512.09742). They’ve identified weird generalizations, where fine-tuning a model on outdated information (like old bird names) causes it to spew facts from a bygone era. For instance, it might insist the electrical telegraph is a recent invention. But the real kicker? They’ve also uncovered inductive backdoors, a chilling extension of semantic leakage that allows attackers to hijack models in ways that are nearly impossible to detect or patch.
Here’s the uncomfortable truth: we’re building our future on machines that excel at finding correlations but fail at understanding causation. As Gary Marcus puts it, “Putting society in the hands of giant, superficial correlation machines is not going to end well.” (https://open.substack.com/pub/garymarcus/p/llms-coding-agents-security-nightmare?r=8tdk6&utm_medium=ios)
For a real-world example, check out this demo (https://jrohsc.github.io/music_attack/) showing how statistical correlates can bypass Suno’s copyright defenses, allowing users to generate Eminem-style songs without permission. It’s both impressive and deeply unsettling.
So, here’s the question: Are we willing to accept the risks of relying on systems that don’t truly understand the world they’re shaping? Or is it time to demand more from our AI—and ourselves? Let’s debate this in the comments. (And yes, I’m aware I anthropomorphized AI for clarity—a necessary evil for the sake of explanation!)