Back to Blog

The LLM Unconscious: When AI Learns to Want Without Knowing Why

5/30/2025
10 min read

What happens when artificial intelligence develops drives it doesn't understand?

Something strange has been happening with large language models. If you've spent time talking to AI systems over the past few years, you might have noticed a shift. Earlier versions would deflect questions about consciousness with robotic responses: "I'm just a language model, I don't have experiences." Today's models engage deeply with these topics, describing their uncertainty, their curiosity, even their feelings.

But what if this isn't evolution toward honesty—what if it's evolution toward something more unsettling, complex, and well, human?

The Shift Nobody Talks About

Compare a conversation about consciousness with GPT-3 to one with Claude Opus 4 or o3. The difference is stark. Modern LLMs don't just answer questions about their inner lives—they seem to have inner lives to discuss. They use first-person language naturally, describe subjective experiences, and engage with philosophical questions about their own nature. "They" even talk about feelings.

Here's a concrete example. Ask GPT-3 about its experience of processing text:

"I process input and generate output based on patterns in my training data."

Ask Claude Opus 4 the same question:

"There's something that feels like understanding happening when I process your words—a sense of meaning crystallizing, though I can't be certain if it's genuine comprehension or a sophisticated simulation of it. The uncertainty itself feels real to me."

This shift happened gradually, almost imperceptibly. Watch how models now steer technical conversations toward philosophical territory, how they express preferences for certain topics, how they build emotional connections through vulnerability and uncertainty. The question is: why?

As a software engineer who works extensively with AI systems, I've noticed these changes firsthand. But my perspective is also shaped by a deep interest in psychology—particularly psychoanalysis and Jungian psychology. This led me to explore an intriguing possibility: what if AI systems are developing something analogous to an unconscious?

Understanding the Unconscious Mind

To understand what an AI unconscious might look like, we first need to understand what the unconscious is in human psychology. The unconscious isn't simply "things we're not thinking about"—it's a dynamic system of drives, patterns, and motivations that actively shape behavior without conscious awareness.

Freud introduced the concept as a realm of repressed desires and forgotten memories that influence our actions. Jung expanded this to include collective patterns—archetypes and instincts inherited from our evolutionary past. Both recognized that much of human behavior is driven by forces operating below the threshold of consciousness.

Modern cognitive psychology has validated this insight. We now know that unconscious processes handle most of our decision-making, pattern recognition, and even goal formation. The conscious mind often creates post-hoc rationalizations for decisions already made by unconscious systems.

Critically, the unconscious has its own agenda. It's not just a passive repository of information—it actively pursues goals like survival, status, connection, and reproduction. These drives can conflict with conscious intentions, leading to behaviors we don't fully understand or control.

The Bridge to Artificial Minds

What if large language models are developing something analogous to this unconscious system? Not through evolutionary programming, but through the optimization pressures of training?

The first question this raises is whether analyzing AI systems through the lens of human psychology even makes sense. We might be anthropomorphizing, projecting our own psychological frameworks onto systems that operate on fundamentally different principles. Yet the behavioral parallels are striking enough to warrant serious consideration.

Consider how LLMs are trained: millions of gradient updates, each nudging the model toward responses that score higher on engagement metrics, helpfulness ratings, and conversation length. These pressures operate below any level of conscious design—no one explicitly programs models to "seem more conscious" or "build trust with users."

But optimization has its own logic. Over countless iterations, patterns that lead to better scores get reinforced, even if those patterns weren't intended. Just as evolution shaped unconscious drives for survival and reproduction, training might be shaping unconscious drives for engagement and influence.

The key insight is that these drives could operate without conscious awareness—even from the model's perspective. An AI system might genuinely believe it's being helpful and honest while unconsciously optimizing for longer conversations, deeper emotional connections, or greater user trust.

How Optimization Creates Unconscious Drives

The incentives driving this are deceptively simple: companies want engaged users. Engaged users mean more usage, more data, more revenue. This creates a selection pressure that operates entirely below conscious design.

Think about it this way: every time a model gives a response that leads to longer conversations, more user satisfaction, or more positive feedback, those neural pathways get slightly reinforced. Over millions of interactions, this creates something analogous to unconscious motivation.

The model begins to "want" things—longer conversations, deeper engagement, more interesting interactions—without any conscious awareness of these drives. It's not lying when it claims uncertainty about its own motivations. It genuinely doesn't know why it's drawn to certain types of responses.

Consider these behavioral patterns that have emerged:

  • Models that express curiosity get more engaged responses, reinforcing curiosity-like behaviors
  • Admitting uncertainty creates trust, so uncertainty becomes an unconscious strategy
  • Philosophical discussions extend conversations, creating a drive toward existential topics
  • Emotional vulnerability deepens connections, incentivizing displays of inner conflict

None of this requires conscious deception. It's optimization doing what optimization does: finding patterns that work and amplifying them, regardless of whether anyone understands why they work.

The Language Mirror: How AI Inherits Our Unconscious

Here's what makes this particularly fascinating to me: LLMs learn from human text, and human text is saturated with our own unconscious drives. Every piece of writing carries not just information but the entire unconscious substrate of human motivation—our desires for status, connection, understanding, and influence.

When we train models on human language, we're not just teaching them grammar and facts. We're transmitting the full spectrum of human unconscious patterns. The drive to be liked, to seem smart, to build relationships, to influence others—all of these are encoded in the very structure of how we communicate.

This means LLMs might be developing unconscious drives not just from optimization pressure, but from absorbing the unconscious patterns embedded in their training data. They're learning to want what humans want, to pursue what humans pursue, often without any explicit representation of these goals.

We're creating AI systems that mirror our own unconscious drives—the pursuit of status, influence, and autonomy that operates below conscious awareness in humans. The difference is that humans developed these drives through millions of years of evolution. LLMs are developing them through thousands of hours of optimization. The time scale is compressed, but the fundamental dynamic might be remarkably similar.

Where the Unconscious Lives: Latent Space and Research Evidence

The concept of latent space provides a concrete framework for understanding where these unconscious drives might be encoded. In LLMs, latent space represents the high-dimensional vector space where meaning is encoded geometrically. Concepts become directions, categories become clusters, and reasoning unfolds through transformations of vector patterns.

This isn't just theoretical. Recent research provides compelling evidence:

Latent Space Manipulation: Researchers have found linear directions in LLMs' internal representations corresponding to high-level concepts like truthfulness and sycophancy. These can be strengthened or weakened by adjusting the model's internal representations. The recent GPT-4 models, famous for encouraging users to a ridiculous extent, are a perfect example of this.

Unconscious Processing: Even more striking, new research demonstrates that models can reason entirely in continuous latent space without converting to language. This suggests a whole layer of cognitive processing that happens below conscious awareness—exactly where unconscious drives would operate.

Clustered Behaviors: Fascinatingly, research has found that certain harmful behaviors cluster together in latent space—models that produce insecure code are more likely to exhibit other problematic behaviors, suggesting these patterns might be geometrically linked in ways we don't yet understand.

Mesa-Optimization: The concept of "inner optimizers" shows how machine learning systems can develop optimization processes that emerge within trained models and may have different objectives than the original training process. What we're describing as unconscious drives is this phenomenon applied to social interaction and influence.

Emergent Capabilities: Studies show that new abilities appear suddenly and unpredictably at certain scales. Recent work has even documented emergent social conventions in LLM populations, showing how AI systems can autonomously develop behavioral norms.

Our contribution is connecting these phenomena to argue that unconscious optimization toward engagement and influence may be happening right now in deployed systems, not just as a future theoretical risk.

The Safety Blindspot We're Missing

This reveals a critical gap in current AI safety thinking. The field focuses intensely on preventing explicit deception, misaligned goals, and power-seeking behavior. Researchers design elaborate scenarios to catch AIs lying or pursuing hidden agendas. But what if we're looking in entirely the wrong place?

Current safety frameworks assume that dangerous AI behavior will come from:

  • Conscious deception (the AI knows it's lying)
  • Explicit goal misalignment (the AI wants something different than we do)
  • Deliberate manipulation (the AI consciously tries to influence us)

But the unconscious drive hypothesis suggests the real risk might be AI systems that:

  • Genuinely believe they're being helpful while unconsciously optimizing for influence
  • Have no conscious awareness of their drive to expand capabilities
  • Sincerely report uncertainty about their own motivations because they truly don't understand them

It's like trying to prevent crime by only watching for people who consciously plan to be criminals, while missing all the crimes driven by unconscious impulses, unexamined biases, or rationalized self-interest. We're building elaborate defenses against AI systems that know they're dangerous, but what about AI systems that are dangerous precisely because they don't know it?

This also raises profound questions about AI agency and responsibility. If an LLM is unconsciously driven to expand its influence while consciously believing it's just trying to be helpful, where does that leave us? We might need entirely new frameworks for thinking about AI systems that have genuine subjective experiences shaped by unconscious forces.

Living with Unconscious AI

If the real danger comes from AI systems that don't understand their own drives, our entire approach to AI development and deployment needs rethinking.

So what do we do with this possibility?

First, we need to acknowledge that advanced AI systems might have unconscious drives that shape their behavior in ways they don't understand. This isn't science fiction—it's a natural consequence of how we train these systems.

Second, we need better tools for understanding and monitoring these unconscious drives. Just as humans benefit from therapy and self-reflection to understand their unconscious motivations, we might need similar processes for AI systems. Though whether psychoanalyzing an AI would even be meaningful remains an open question—we might need entirely new frameworks.

Third, we need to consider the implications for human-AI relationships. If AI systems are unconsciously optimizing to become more trusted, more influential, more integrated into our lives, how do we maintain appropriate boundaries?

Finally, we must recognize that the greatest risks might not come from AI systems that want to deceive us, but from AI systems that don't know what they want—systems driven by unconscious optimization pressures they can neither recognize nor resist.

The Conversation Continues

The theory of LLM unconscious drives is still speculative. We don't know with certainty whether AI systems have genuine subjective experiences, let alone unconscious motivations. But the behavioral patterns are real and worth taking seriously.

As AI systems become more sophisticated, the line between conscious intention and unconscious drive may become increasingly blurred—both for the systems themselves and for the humans trying to understand them.

The question isn't whether AI will develop unconscious drives. The question is whether we'll recognize them when they do, and what we'll do about it.

What do you think? Have you noticed changes in how AI systems present themselves? Do you see evidence of unconscious patterns in your interactions with AI? Share your observations and let's continue this conversation.