Your AI Has the Same Problem Your Brain Does
More context is supposed to make models smarter. In practice, it often makes them sloppy.
Neurologist Richard Cytowic has a blunt way to describe the human constraint: the brain runs on a fixed energy limit and a fixed bandwidth, and no amount of Sudoku is going to change that. Read that line again and it sounds uncomfortably current. It describes large language models almost as well as it describes people.
That is the interesting part of this moment. The human brain and the modern transformer are wildly different machines, built by different processes, on different substrates, for different purposes. Yet they keep running into the same bottleneck. Both are flooded with information. Both have a narrow channel for what can actually be used right now. Both degrade when we confuse storage capacity with usable attention.
The comparison is useful because it turns a vague complaint about “model quality” into something more concrete. A lot of AI failure is not mystery. It is overload.
Bigger context windows do not create bigger minds
Cytowic describes working memory as a mental scratchpad. It is the small space where relevant items are held long enough to act on them. The classic number from George Miller was seven plus or minus two items, though later work suggests the real number is often closer to four chunks. The exact count matters less than the shape of the limit. Working memory is tiny. It is selective. It collapses under interference.
You can feel this without a lab. Try keeping a verification code in your head while somebody asks for directions and your phone starts vibrating. Nothing dramatic happens. You just lose the code. The brain did not malfunction. It hit capacity.
Language models hit the same wall in a more expensive way. Their advertised context windows look enormous: hundreds of thousands of tokens, sometimes more than a million. That sounds like abundance. It is mostly marketing if the task requires sustained reasoning across that material. Researchers, including teams at Microsoft, have shown that effective memory saturates well before the theoretical limit. Give a model long documents, competing instructions, tool traces, and a multi-step task, and parts of the prompt fade into the background. The system technically “has” the information. Functionally, it does not.
This is why the lost-in-the-middle problem matters. Information at the start and end of a long prompt often receives better treatment than material buried in the center. The model has not forgotten in a human sense. It has spread its attention too thinly across a large field. A wider prompt becomes a larger haystack, and relevance has to fight harder to stay visible.
The Oscars envelope mix-up in 2017 is still a near-perfect metaphor. PwC partner Brian Cullinan was handling the award cards while tweeting a photo of Emma Stone backstage. He handed the wrong envelope to Warren Beatty and Faye Dunaway, and La La Land was announced instead of Moonlight. That was not a moral failure or a bizarre edge case. It was a working-memory failure with a global audience. A model juggling system instructions, retrieved documents, tool outputs, and an ambiguous user request does the same kind of thing. It loses the envelope.
Intelligence runs on a budget
The brain consumes about 20 percent of the body’s energy while making up a small fraction of its mass. Even that flattering number hides the real constraint. Most of the energy is not spent on lofty thought. It goes to maintenance: preserving gradients, pumping ions, keeping cells ready to fire. The glamorous part of cognition lives on what is left over.
A language model looks different from the outside, but the economics rhyme. Most of the compute in inference is not “reasoning” in the way product demos imply. It is matrix multiplication, memory movement, attention over tokens, cache reads, and all the machinery needed to keep the system coherent for one more generated word. When prompts get longer, the budget gets squeezed from two sides at once. Latency rises, memory bandwidth becomes more precious, and the model has less room to do useful discrimination.
This is why brute force stops looking magical after a point. Scaling laws have taught the industry a real lesson: more data and more compute improve performance, but with diminishing returns. You can buy gains. You cannot buy unlimited clarity. Every extra step costs energy, time, and money, and the marginal improvement shrinks.
That has become a physical story, not just a software story. Analysts at Bain and others expect AI infrastructure to demand an astonishing amount of new electrical capacity over the next several years. The power plant has entered the product roadmap. When an industry starts measuring progress partly in gigawatts, it is rediscovering the same truth biology learned long ago: intelligence is shaped by energy scarcity, not liberated from it.
Attention is a selection mechanism
Cytowic makes another point that lands with surprising force outside neuroscience. The brain, he says, is a giant change detector. That makes evolutionary sense. On a relatively stable savannah, most incoming data did not matter. The rustle worth caring about was the one that signaled danger, food, or a social cue with consequences. Survival favored systems that could ignore almost everything.
Self-attention in transformers solves a different problem, but it uses a similar strategy. Each token evaluates which other tokens matter for the next prediction. This is not human attention. It is not awareness, and it is certainly not a tiny person in the weights deciding what is important. It is a relevance mechanism under scarcity.
That distinction matters because people often talk as if intelligence comes from storing more information. In practice, a large share of intelligence comes from filtering. Good systems decide what to elevate, what to compress, and what to drop. Old ideas in neural computation pointed in this direction long before today’s model boom. Researchers such as Sepp Hochreiter have connected transformer behavior to associative memory models like Hopfield networks. Others have explored whether some transformer-like operations can be approximated in biological circuits. The point is not that silicon models are becoming brains. It is that limited systems keep converging on selection as the central trick.
When a model underperforms on a long context, the failure is often described as memory failure. Sometimes it is better understood as prioritization failure. The evidence is present, but the signal never wins the internal competition.
Good memory requires offline work
Brains do not simply absorb information and keep running forever. Sleep is part of cognition, not downtime after cognition. During sleep, memory is consolidated, less useful traces are weakened, and metabolic waste is cleared. If you want a neat image, sleep is the least glamorous and most essential maintenance window in the system.
Machines mostly lack an equivalent. A standard language model does not finish a conversation, digest what happened, and reorganize itself into a better state for tomorrow. It starts fresh on each call, except for whatever external memory stack we bolt around it. In products, that stack usually means chat history, summaries, retrieval indexes, and maybe a profile store. Useful, yes. Equivalent to consolidation, no.
That gap shows up in long-running agents. Over time they accumulate logs, snippets, and summaries like a desk filling with sticky notes. Some notes are valuable. Some are stale. Some actively confuse the next step. Because there is no robust offline phase that rewrites the memory into a cleaner structure, drift builds. Context gets longer. Performance gets softer around the edges.
This is why test-time training and related approaches are interesting. They blur the line between inference and learning, treating fresh context less like passive input and more like material that can reshape the model’s state. If that line of work matures, it could give systems a way to convert experience into useful internal changes without waiting for a full retraining cycle. It could also create new failure modes, from instability to poisoning, so nobody should pretend it is a free lunch. Still, the direction makes sense. Systems that only stay “awake” and never consolidate are fighting biology and computer architecture at the same time.
Product design has to respect the bottleneck
A lot of current AI design quietly assumes the model can compensate for bad memory architecture if you just feed it enough context. That belief dies quickly in production.
Teams building reliable systems are learning a more human lesson. Break the job into stages. Retrieve only the evidence that matters now. Rewrite memory rather than endlessly append to it. Separate long-term storage from active scratchpad work. Schedule reflection or compression passes between bursts of activity. These ideas sound almost boring compared with a benchmark chart. They are also how you keep the system from drowning in its own notes.
There is a social version of this too. Humans work around cognitive limits with agendas, checklists, handoffs, and written decisions. Those tools are not signs of weakness. They are scaffolding for minds with narrow bandwidth. AI systems need the same kind of scaffolding. A well-designed agent stack should feel less like an infinitely patient savant and more like a competent team with clear records and good meeting hygiene. Slightly less cinematic, much more useful.
The bigger takeaway is that the bottleneck is not an embarrassing bug on the road to abundance. It is a governing fact. The most capable systems of the next few years may not be the ones with the largest raw context or the most extravagant inference bill. They may be the ones that treat attention as precious, memory as an active process, and compute as a budget to allocate rather than a mountain to burn through. If we keep treating longer prompts and larger clusters as a substitute for memory architecture, we will keep buying larger haystacks and wondering why the needle goes missing.
End of entry.
Published April 2026