Energy, Not Information: What Mitochondria Teach Us About AI

The argument about AI usually lands on familiar terrain. More data, better algorithms, bigger models, smarter architectures. Information sits at the center of the picture, as if intelligence were mostly a matter of arranging symbols cleverly enough.

Biology suggests a harsher constraint. You can have astonishing amounts of information and still go nowhere for billions of years. The missing variable is energy: not as a poetic concept, but as a physical budget for maintaining complexity in the first place.

That is the part of Nick Lane’s work that should unsettle AI people. If he is right about life, then some of our debates about intelligence are pointed at the wrong bottleneck.

Life’s long stall

For most of Earth’s history, life stayed simple. Not unsuccessful, not fragile, not primitive in the lazy sense. Bacteria conquered the planet and still run most of its chemistry. They are metabolically inventive, evolutionarily durable, and collectively more diverse than anything that came later.

Yet they did not become elephants, oak trees, or nervous systems. They did not build the kind of internal complexity we associate with complex life. For roughly two billion years, evolution had variation, selection, gene exchange, ecological pressure, and absurd amounts of time. It still did not cross that threshold.

Lane’s claim is that this was not mainly an information problem. Bacteria had plenty of genes available across the biosphere. They could swap them horizontally. They explored biochemical possibilities with enviable creativity. What they lacked was the energy budget needed to support a much larger amount of genetic and cellular machinery per cell.

He puts it bluntly:

“It’s not in the genes, it’s not about information. There’s something else which is controlling it. That something is the acquisition of these power packs in our cells called mitochondria.”

The key idea is easy to miss because we are trained to think of genes as instructions, full stop. In Lane’s account, genes are also liabilities. A larger genome is useful only if the cell can afford to read it, regulate it, repair it, replicate it, and coordinate the molecular mess that follows. Complexity is expensive. It needs a continuing stream of energy, not merely a larger library.

Mitochondria changed the math. Through endosymbiosis, an ancestral archaeal host incorporated bacteria that became internal power stations. Instead of relying on energy generation across the outer membrane alone, eukaryotic cells gained countless internal membranes producing ATP throughout the cell. This created a massive jump in energy available per gene. That surplus let cells grow larger genomes, more elaborate regulation, dynamic cytoskeletons, intracellular transport, and eventually multicellularity.

If you want the compressed version, it goes like this: information sets possibilities, but energy decides which possibilities can stay alive long enough to matter.

Information has carrying costs

This is where the analogy to AI becomes useful, and dangerous in exactly the right way. Useful because it changes the question. Dangerous because biology analogies can turn into TED Talk wallpaper if you are not careful.

A model’s weights look like information. Its architecture looks like intelligence embodied in code. But those weights do nothing by themselves. They are inert until compute moves through them. A trillion parameters sitting on disk are closer to DNA in a freezer than to a thinking system. What matters is the cost of activating, updating, routing, and synchronizing those parameters across time.

In other words, compute is not just fuel poured into a finished machine. It is part of the machine’s ability to exist as a coherent process.

This distinction matters. In software, we often talk as if algorithmic structure were primary and compute were a quantity you rent from a cloud provider. Reality is tighter than that. Architecture expresses assumptions about what forms of compute are cheap. Transformers spread because self-attention was a strong idea, but also because matrix multiplication on GPUs was abundant, programmable, and economically scalable. The algorithm fit the energy regime.

That last phrase deserves to linger. Every major AI paradigm depends on an energy regime: a combination of hardware design, memory hierarchy, cooling, interconnects, fabrication limits, and economics. The model is the visible tip. The invisible bulk is the infrastructure that can afford to keep the model “alive” during training and inference.

Biology had genes long before it had animals. AI has models long before it has anything like robust, adaptive, continuously learning systems in the world. In both cases, the glamorous part can distract from the substrate doing the heavy lifting.

Compute behaves like metabolism

Once you look at AI through this lens, some familiar facts rearrange themselves.

Scaling laws tell us that more compute, more data, and larger models tend to produce smoother improvements than many researchers expected. Capabilities that looked qualitatively new often appeared after crossing quantitative thresholds. There is a temptation to tell this story as pure information theory: more tokens in, more parameters, better approximations out.

But the practical story is more physical. The ability to train a frontier model depends on electricity, chip yields, memory bandwidth, power density, network latency, and the logistics of coordinating vast clusters without collapsing under communication overhead. Even inference, which users experience as a clean text box, is bound to a metabolism of data movement and energy conversion. Most of the cost is not arithmetic in the abstract. It is shuttling bits around the system.

That makes AI oddly similar to cells. In many computing systems, moving information is more expensive than processing it. In biology, too, transport and coordination become dominant costs as complexity rises. A cell cannot just get bigger forever and expect diffusion to sort things out. It needs compartments, motors, scaffolding, and local power.

Current AI systems show versions of the same tension. Context windows grow, but attention costs balloon. Models get larger, but memory access and interconnect become central bottlenecks. Agents can use tools, but orchestration overhead eats into the gain. More capability often arrives entangled with worse latency, higher energy use, and brittle system-level complexity.

This is why “compute versus algorithms” is the wrong debate. Good algorithms are often ways of spending energy more intelligently. Mixture-of-experts models reduce active compute per token. Retrieval avoids storing everything in parameters. Quantization squeezes more work from the same hardware envelope. Better optimizers shorten training time. These are not alternatives to compute in the strong sense. They are techniques for reallocating a limited metabolic budget.

Lane’s point helps sharpen the distinction. A lineage can be information-rich and still stuck if its energy architecture cannot support greater organization. AI may be in a comparable phase. We have abundant model ideas, endless benchmarks, and industrial-scale data pipelines. What we may not yet have is the computational equivalent of the internal reorganization that makes more complex forms stable, not merely larger.

The transformer may be a bacterial success story

It is worth saying something almost heretical in AI circles: the transformer might be less like an embryo of general intelligence than like a brilliantly adapted bacterium.

That is praise, not insult. Bacteria are among the great winners of natural history. They are efficient, resilient, and capable across a huge range of environments. The point is structural. A successful design can dominate its era and still be constrained in ways that prevent a leap to another regime of complexity.

Transformers have exactly this flavor. They are remarkably general sequence machines, but they remain expensive in a very particular way. Their internal computation is broad rather than deeply structured. They compress vast regularities into weights, then rehydrate capability through heavy token-by-token activation. They can simulate planning, use tools, and produce extended reasoning traces, yet each of those gains comes with significant inference cost and awkward system scaffolding around the core model.

If you wanted a biological metaphor, the issue is not that they lack information. It is that the architecture struggles to localize and sustain richer internal organization without paying heavily in compute. Memory is bolted on through context stuffing, retrieval layers, vector databases, and external tool calls. Persistence is uneven. Long-term adaptation is mostly absent in deployed settings because online learning is expensive and dangerous. Internal specialization exists, but often in forms we barely control or interpret.

Seen this way, the recent decade looks different. Maybe scaling has not proved that pure statistical learning inevitably climbs toward human-like intelligence. Maybe it has shown that under a favorable energy regime, a certain class of architectures can exploit an enormous amount of low-friction compute before deeper structural limits bite.

The slowing economics of scaling fit that interpretation. Training runs get costlier. Marginal gains become harder to buy. Energy demand and chip supply start shaping roadmaps as much as research taste. Memory bandwidth becomes destiny, which is a grim sentence until you notice how often it turns out to be true.

Artificial mitochondria would be architectural, not decorative

If this frame is right, then the interesting question is not whether the next leap comes from bigger clusters or a clever paper. It is whether AI will find an equivalent of endosymbiosis: a reorganization that changes the energy available to each functional unit of intelligence.

That probably does not mean a single magic technology. Biology did not get complexity from “more ATP” in the abstract. It got it from a cell design that distributed energy internally, close to where work happened, while enabling more genes and more elaborate control. The relation between structure and power was intimate.

For AI, several threads point in that direction.

Near-memory and in-memory computing attack one of the most painful costs in modern systems: data movement. If memory and computation live closer together, the system wastes less energy hauling activations back and forth like a city moving groceries one tomato at a time.

Neuromorphic hardware is another candidate, though the field has promised revolutions for long enough to qualify for its own geological era. Still, the attraction is real. Conventional accelerators are excellent at dense numerical work, but intelligence may require more event-driven, sparse, stateful forms of computation than today’s dominant stack handles gracefully.

Radical hardware-software co-design also matters. The transformer won partly because hardware bent around it. A future architecture could win by assuming a different hardware substrate from the start: persistent local state, asynchronous modules, specialized memory, low-power always-on components, or dynamic routing that keeps most of the system dormant most of the time.

The most interesting possibility is modularity with local energy budgets. Biological complexity scales by creating subsystems that can do meaningful work semi-independently while remaining coordinated. Current AI systems are often modular only at the application layer, with orchestration handled externally and expensively. A more mature design might support internal specialists, memory processes, planning loops, and tool interfaces with much cheaper coordination. The breakthrough would be less about making one giant network smarter than about making many functions coexist without the communication bill exploding.

That said, metaphors can flatter speculation. There may be no single “mitochondria moment” for AI. Evolution had one path because life is built from cells in chemistry. Computing is a wider design space. Improvements may arrive as a pileup of smaller changes: better interconnects, analog accelerators, sparse training, retrieval-native architectures, more capable compilers, and tighter integration between model and memory. History rarely presents us with a neat icon while we are living through it.

The argument also cuts against algorithm romanticism

There is a cultural reason this matters beyond technical forecasting. Many people, especially in software, prefer to believe that intelligence is mostly elegant abstraction. It is a comforting story. It flatters the mind and softens the industrial reality underneath.

But frontier AI is already an energy story. The race is shaped by grid access, cooling systems, chip packaging, fabrication capacity, and the politics of supply chains. Training a large model is not like writing a theorem on a whiteboard. It is closer to building an artificial metabolism and keeping it stable long enough to learn.

This does not make algorithms secondary in any simple sense. Better ideas can unlock massive gains. The history of AI is full of moments when a conceptual shift mattered more than raw hardware. Backpropagation, attention, diffusion, and reinforcement learning each changed the frontier. Yet each also succeeded because the surrounding energy regime could sustain them at scale. A brilliant method that cannot afford its own execution is academically interesting and commercially irrelevant.

There is a deeper lesson here. Information and energy are not rival explanations. They are coupled. Biological information needs a power source to become development, repair, movement, and thought. Machine intelligence needs compute to become inference, adaptation, memory, and action. The temptation is to talk about code as if it were self-executing. Physics keeps ruining the illusion.

That is why arguments about AGI sometimes feel strangely detached from the machine room. They describe capabilities as if the only unknown were the right cognitive recipe. Lane’s work nudges us toward a duller, truer sentence: intelligence is what a system can afford to sustain.

The bottleneck has moved into view

The most productive shift here is not to worship compute. It is to stop treating compute as a generic quantity. In biology, what mattered was not simply more energy in the environment. The planet had plenty. What mattered was access, distribution, and control inside the organism. The architecture changed what kind of complexity was supportable.

AI may be approaching the same realization. The frontier is no longer constrained only by what we can write down mathematically. It is constrained by how efficiently we can move, store, and apply energy across systems that are trying to become more stateful, more persistent, more multimodal, and more agentic. That is a different kind of problem from model scaling, even when it appears in the same graphs.

So the live question is not whether algorithms matter or compute matters more. It is whether our current computational metabolism can support the next layer of complexity we keep talking about. Biology suggests a warning and a hint. A world can overflow with information and still wait a very long time for a new form to become viable.