12 min read

LLMs Don’t Have a Real Goal — Richard Sutton’s Foundational Critique

The strangest thing about the current AI boom is how often fluency gets mistaken for purpose.

A system writes clean code, explains quantum mechanics, drafts a legal memo, and suddenly people talk as if it has a mind moving through the world. Richard Sutton has spent decades building a very different picture of intelligence, and from that vantage point the story looks much less magical. His claim is simple enough to sound almost old-fashioned: intelligence is about achieving goals. A model that predicts the next token, however impressive the prediction, is not doing that.

Sutton is not a random contrarian lobbing rocks at the parade. He is one of the central architects of reinforcement learning, the co-inventor of temporal-difference learning and policy gradient methods, and the 2024 Turing Award winner. When he says large language models are missing something fundamental, it is worth slowing down long enough to understand exactly what he means.

His criticism is deeper than “LLMs hallucinate” or “they lack reasoning.” Those are symptoms. The actual objection cuts closer to the bone: next-token prediction is not a substantial objective because it does not involve acting in the world and learning from the consequences. That sounds abstract until you unpack it. Then it becomes one of the clearest fault lines in modern AI.

Intelligence requires a goal with consequences

Sutton often returns to a line from John McCarthy: intelligence is “the computational part of the ability to achieve goals.” That definition is narrow in a useful way. It forces the conversation away from surface behavior and toward control.

A calculator can return the right answer to a multiplication problem. A chess engine can choose moves that improve its position. A robot can learn which grip keeps a box from slipping. These systems differ wildly, but they share a structure. They have objectives, they take actions, and those actions lead to outcomes that can be evaluated. The loop matters.

Language modeling is a different loop. Tokens arrive. The system predicts what token comes next. If the prediction matches the training distribution, it gets rewarded during training. If it misses, the weights get nudged. There is optimization happening, certainly. But the objective lives inside a textual stream. Predicting the next word does not, by itself, alter an external state. The world does not become warmer, safer, or more stable because the model guessed “therefore” instead of “however.”

That is what Sutton means when he says next-token prediction is not a real goal. He is not denying that it is an optimization target in the mathematical sense. He is saying it is too thin to count as intelligence in the stronger sense. It is a training signal detached from consequences.

This distinction gets blurry because language is how humans describe goals, report consequences, and coordinate action. A model that is good at text can talk about all those things with startling competence. It can explain how to repair a bike chain without ever touching grease. It can outline a negotiation strategy without having to live through the other party’s reaction. The map starts impersonating the territory.

Mimicry inherits human structure without owning it

The reason LLMs feel smarter than old statistical systems is that human language carries compressed traces of the world. Text is saturated with causality, social norms, practical knowledge, and the residue of millions of goals pursued by real people. Train on enough of it, and the model absorbs a remarkable shadow of human competence.

Sutton’s point is that this is still a shadow.

When a model predicts what a person would likely say next, it is not directly modeling the world. It is modeling artifacts produced by creatures who model the world. That difference can seem pedantic until you ask the model to operate under novelty, friction, and consequence.

A useful analogy comes from robotics. Suppose you train a robot only by watching videos of humans picking up cups. It may learn the rough shape of successful behavior. It may even imitate the motion convincingly in familiar conditions. But it has not learned what happens when the cup is wet, when the table tilts, or when its own gripper is slightly misaligned. Those facts show up when action meets resistance. Watching behavior is not the same as discovering control.

Large language models are, in a sense, spectacular behavior cloners for text. They inherit the structure embedded in human discourse. They do not automatically inherit the grounded process that produced it.

This is why the phrase “understanding the world” needs careful handling. If by understanding we mean “can generate language that reflects many regularities of the world,” then LLMs obviously do some of that. If we mean “can form and refine beliefs through intervention, using feedback from consequences,” then pretraining gets you only partway. The missing part is not cosmetic. It is the part where reality pushes back.

Ground truth is the missing anchor

Sutton makes an unusually sharp epistemic claim here: without ground truth, there is no genuine prior knowledge to speak of. That sounds counterintuitive because people often describe pretrained models as giant stores of prior knowledge. They have seen textbooks, code repositories, scientific papers, and manuals. What else would you call that?

The problem is that text does not come with a stable truth label for most of what matters. It comes with assertions, disagreements, fashions, misconceptions, deliberate lies, partial observations, and context that is often missing. The internet is a library where every shelf has useful material mixed with rumor, performance, ideology, and stale facts that were superseded years ago. A model trained to continue text learns patterns across all of it.

This does not mean pretraining is worthless. It teaches grammar, style, broad factual associations, common human framings, and many latent regularities. But when Sutton says “there isn’t any truth” in the relevant sense, he is pointing to something harder: in the empirical world, truth often arrives through experiment. You act, the world responds, and that response constrains what can be believed next.

A medical claim becomes sharper when a treatment succeeds or fails. A control policy becomes better when the robot drops fewer objects. A trading strategy stops looking clever after a month of losses. In each case, the system can update against consequences. There is a target outside the text.

Pretraining has no such anchor for most of its content. It absorbs correlations among utterances. It learns which continuations fit the discourse. That is why an LLM can sound deeply informed while remaining strangely untethered. The style of knowing is present. The hard correction mechanism is weaker.

Post-training helps, but only up to a point. Instruction tuning and preference optimization teach models to be more useful, less toxic, and more aligned with human expectations. They improve the product enormously. Yet the reward is still mostly about responses, not about long chains of action in an environment that pushes back with uncompromising feedback. A polite answer is easier to optimize than a successful intervention.

Mathematics flatters language models for a reason

One of the strongest objections to Sutton’s thesis comes from the recent wave of mathematical and coding performance. If these systems can solve Olympiad-level problems, write sophisticated programs, and carry out multi-step proofs, doesn’t that show they have crossed from mimicry into something richer?

It shows something important, but not quite that.

Sutton draws a line between computational worlds and empirical ones. Mathematics is a computational world. Its objects are abstract, its rules are crisp, and verification is unusually clean. You can plan inside that space. You can search. You can decompose a problem, test steps, backtrack, and eventually converge on a valid answer without touching the messy uncertainty of physical reality.

That is a very hospitable environment for systems built from pattern recognition plus search, especially when wrapped with external tools. A model can generate candidate steps, call a verifier, run code, inspect outputs, and iteratively repair the solution. The loop begins to look more like problem solving because, inside formal domains, there is a clear notion of correctness.

The empirical world is harsher. If you are controlling a warehouse robot, treating a patient, running a power grid, or managing a supply chain during a port strike, your beliefs are incomplete and your actions change the situation. The reward signal may be delayed. The environment may be partially observable. The causal story may shift under your feet. You cannot solve those domains by sounding like someone who once explained them well.

This is where language-model triumphs can mislead. Formal tasks reward internal coherence and verifiable steps. Real-world tasks reward adaptive control under uncertainty. The overlap is real, but it is not total. A system that shines in one can still be brittle in the other.

The industry prefers imitation because imitation scales

If Sutton’s critique is so fundamental, why has the market rewarded the opposite path so aggressively?

Because imitation is useful, and because learning from real consequences is expensive.

A model that drafts emails, summarizes meetings, generates boilerplate code, and answers support questions creates immediate value even if it has no deep objective beyond producing plausible text. Much of white-collar work is mediated through language. Compressing that layer of work turns out to be a huge business.

There is also a practical reason. The internet provides trillions of tokens. The world does not provide trillions of clean, safe, cheap interaction trajectories. Training a robot by trial and error breaks hardware, takes time, and raises safety problems. Training an assistant through human preference data is awkward enough. Training a broadly capable agent through open-ended real-world feedback is another order of difficulty.

So the field took the abundant substrate: human text. That was rational. It still is.

The trouble begins when product demos drift into claims about agency. A model that helps write a report is one thing. A model that is supposed to run projects, negotiate contracts, execute scientific research, or manage software systems autonomously needs something stronger than good textual instincts. It needs objectives that connect action to consequence, memory that persists across episodes, and learning signals that come from the world rather than from applause.

That is where Sutton’s critique stops being philosophical and starts looking operational. If your system is meant to do work instead of discuss work, the training story matters a great deal.

Agent wrappers change the picture, but only partly

There is a tempting rebuttal here. Today’s systems are not just base models completing text. They use tools, browse the web, write and execute code, call APIs, maintain memory, and sometimes operate in loops that look a lot like agency. Give an LLM a browser and a shell, and suddenly it can change the world.

That is true in a limited sense. A language model embedded in a larger scaffold can participate in goal-directed behavior. Once it selects actions, observes outcomes, and updates some policy over time, the system as a whole starts moving toward the kind of intelligence Sutton cares about.

But this does not erase the original critique. It relocates it.

The core pretrained model is still built mainly through next-token prediction. The wrapper supplies the goal, the tools, and the feedback. Sometimes that is enough. Many practical products will be good enough with exactly this architecture. Yet anyone hoping the model itself will become reliably agentic just by consuming more text is making a leap that Sutton rejects.

What closes that gap is not another hundred billion tokens of internet prose. It is a stronger coupling between action and consequence. That may involve reinforcement learning over long horizons, simulation environments that capture genuine tradeoffs, or carefully constrained deployment where systems can learn from outcomes without causing harm. None of that is impossible. It is just much harder than scaling pretraining.

This is also where the conversation gets more honest. The central question is no longer whether LLMs are “really intelligent,” which quickly turns into semantics. The practical question is what kind of intelligence current methods are good at producing. Sutton’s answer is that they produce fluent approximations of human behavior, and that is not the same as systems that discover how to achieve goals under uncertainty.

Progress looks different when you take Sutton seriously

If Sutton is even mostly right, a lot of AI discourse has been measuring the wrong thing.

Benchmarks built around static question answering, style transfer, or decontextualized reasoning capture only a slice of what matters. They reward systems for producing outputs that look right to evaluators. They say less about whether a model can choose actions that improve a state of affairs over time. Those are different skills, and conflating them has made many AI claims sound larger than they are.

It also changes what counts as a genuine breakthrough. Another jump in benchmark scores is impressive. A system that learns robustly from the downstream effects of its own decisions would mark a different kind of advance. That shift may come through robotics, software agents, scientific discovery platforms, or hybrids that use language models as interfaces while learning policies through experience. The form is still unsettled.

What Sutton offers is not a dismissal of language models. It is a demand for sharper categories. LLMs are extraordinary cultural and technical artifacts. They compress human expression at a scale that would have sounded absurd a few years ago. They can be genuinely useful, sometimes transformative. But fluency, even very advanced fluency, should not be mistaken for a substantial objective.

The most interesting systems over the next few years will be the ones that stop merely predicting what people would say and start learning from what reality does in response.

End of entry.

Published April 2026