Why Human-AI Co-Improvement May Beat Autonomous AI
Everyone loves the dramatic version of the future. A machine disappears into a feedback loop, rewrites itself, and comes back smarter than its creators by Friday. It is clean, cinematic, and weirdly flattering to our engineering instincts. Build the system once, then let capability compound.
Jason Weston and Jakob Foerster, in a 2025 position paper from FAIR/Meta, argue for something less theatrical and more plausible. If the goal is to reach radically more capable AI, they suggest that keeping humans inside the improvement loop may be both faster and safer than trying to engineer a system that upgrades itself in isolation.
That claim cuts against the mythology around self-improving AI. It also deserves attention, because it lands on a point the field often blurs: the path that sounds most advanced is not always the path that works best.
The self-improvement ladder is real, but uneven
Weston and Foerster break self-improvement into six levels. The early levels are already familiar. Models improve their parameters through training. They improve their training data by generating synthetic examples. They can even shape parts of their own objective through self-evaluation or AI-generated rewards.
After that, the climb gets steeper. Architecture search pushes models toward better structures. Beyond that lies the idea of an AI rewriting its full codebase. Past that is full recursive self-improvement, where a system designs a more capable successor, which then repeats the process.
The first few levels are not science fiction. They are normal practice. Synthetic data pipelines, self-play, constitutional tuning, and automated search all move capability forward. The last levels remain far murkier. A Gödel Machine still feels more like a philosophical provocation than a shipping roadmap.
That matters because public debate often compresses these levels into a single story. People hear “self-improving AI” and imagine a direct line from today's models to runaway autonomy. The paper pushes back on that compression. Some forms of self-improvement are ordinary engineering. Some require assumptions we have not remotely validated.
There is another problem hiding under the technical one. Even if full autonomy were possible, what exactly is the system improving for? “Make yourself better” is not a neutral instruction. Better at what, under whose values, and with what tradeoffs? This is the old alignment problem wearing a more ambitious jacket.
Most real breakthroughs are hybrids
The deeper claim in the paper is not simply that autonomous self-improvement is dangerous. It is that it may be inefficient.
That sounds strange at first. Surely the whole point of autonomy is speed. Remove the human bottleneck and progress accelerates. Yet the history of modern AI is full of advances that came from awkward, powerful combinations of human judgment and machine scale. ImageNet mattered because people decided to build the dataset and define the task before the models learned from it. Transformers did not emerge from a sealed autonomous pipeline. Researchers recognized the value of web-scale data, changed the objective, refined the architecture, and then iterated through thousands of messy decisions.
Breakthroughs often depend on taste before they depend on optimization. A field has to notice that an old constraint no longer matters, or that a neglected resource suddenly becomes usable. Humans are still unusually good at that kind of reframing. Models can generate many candidate moves. They are less reliable at knowing which shift deserves institutional commitment, compute budgets, or a year of research talent.
This is the paper’s central bet. The fastest route to much stronger AI may not be a machine escaping the loop. It may be a tighter loop.
In their framing, co-improvement means humans and AI systems participating together in the process of research, evaluation, implementation, and correction. The point is not to preserve human prestige. It is to exploit complementary strengths. Models are tireless, broad, and increasingly good at pattern search. Humans bring context, value judgment, causal suspicion, and a stubborn ability to notice when the metric has wandered away from the goal.
That last part sounds soft until you watch labs repeatedly optimize what they can measure and then discover, several months later, that users wanted something adjacent but different. A powerful assistant can help you move faster. It can also help you sprint in the wrong direction with world-class efficiency.
Co-improvement is a research method, not a slogan
The useful part of Weston and Foerster’s paper is that it does not stop at principle. They sketch a practical framework for collaboration across the whole research cycle.
Some of it is obvious in hindsight. AI can help researchers identify promising problems, propose hypotheses, and map prior work faster than any human literature review team. It can draft experiments, generate code, run variants, and surface anomalies in the results. But the paper extends the idea further, into benchmark design, error analysis, infrastructure choices, deployment constraints, and safety procedures.
That breadth matters. Too much work on “AI scientists” focuses on automating the visible artifact, usually a paper or an experiment log. The authors argue for something more grounded: improving the quality of science rather than merely accelerating the production of scientific-looking outputs.
That distinction is sharper than it sounds. A system that can generate ten thousand plausible experiments is not necessarily helping if no one has structured the search space well, checked whether the benchmark measures anything meaningful, or noticed that a hidden assumption invalidates the whole setup. Research is not just output generation. It is navigation under uncertainty.
The paper groups this collaborative approach across a dozen mechanisms, but the pattern is simple. Humans and models can share work in problem selection, method design, execution, interpretation, and integration into real systems. The AI does not replace the lab. It thickens the lab.
This idea also aligns with a broader line of thinking in academic work on human-machine cognition. Collins and colleagues, in a 2024 paper on building machines that learn and think with people, argued for systems designed around mutual adaptation rather than one-sided automation. Weston and Foerster push that logic into the most ambitious part of the field: the road toward much more capable models.
Keeping humans in the loop changes the safety picture
The paper’s most interesting move is on safety. It rejects the simple story that more capability automatically means more danger. The authors suggest that greater capability, developed in the right collaborative setting, can reduce specific harms.
Take jailbreaking. A model often fails not because it wants to be unsafe in any meaningful sense, but because it does not reliably understand the adversarial situation around it. It misses cues, follows surface-level instructions, or cannot reason well enough about the user’s intent. A more capable model may resist these attacks better precisely because it understands them better.
That is not a free pass for capability chasing. Smarter systems can produce smarter failures. But it does undermine a lazy binary where “more capable” and “less safe” move in lockstep. In practice, safety depends on how capability is built, what feedback channels exist, and whether the system is being shaped in contact with human oversight instead of beyond it.
Co-improvement changes the safety problem because values and procedures can be developed alongside capability rather than stapled on afterward. The paper talks about co-developing constitutions, oversight processes, and consensus-building mechanisms. This is less glamorous than the dream of a fully aligned autonomous intelligence emerging from first principles. It is also much closer to how stable institutions usually work. We refine norms while we refine tools. We do not solve society in advance and then deploy the machinery.
There is an important caveat. Humans inside the loop are not magic. Humans can rubber-stamp bad outputs, launder dubious decisions, or create a false sense of control around systems they barely understand. Anyone who has seen “human review” become a checkbox in enterprise software knows the genre. Co-improvement only helps if the human role is substantive, not ceremonial.
The paper picks a side in a growing split
Weston and Foerster are arguing against two increasingly visible visions of AI development.
One comes from the “Era of Experience” view associated with Demis Hassabis, David Silver, and Richard Sutton. In that world, advanced systems learn directly from interaction and experimentation at scale. They generate their own experience and improve through it, with less dependence on human-curated data or hand-designed interventions. Applied to domains like materials science or biology, the appeal is obvious. Let the system explore a vast design space that human researchers could never traverse alone.
The paper does not deny the power of that approach. It questions the assumption that humans should steadily recede from it. Scientific exploration is not just a search problem. It is also a framing problem, a governance problem, and a meaning problem. What counts as success in medicine, education, labor, or public infrastructure is never reducible to the raw objective function without residue.
The other vision is more openly post-human. Jürgen Schmidhuber has argued that if advanced AI spreads through the cosmos and humanity becomes a minor character in the story, that may still be a worthy outcome. Weston and Foerster plainly reject that orientation. Their model of progress keeps human participation central, not as a sentimental gesture, but as the thing that makes the system answerable to human life in the first place.
This is a philosophical dispute, but it is also a product decision. Do you build systems for substitution or for augmentation? Do you treat people as temporary scaffolding or enduring collaborators? The answers shape interfaces, evaluation metrics, training goals, and the political economy around deployment.
Open science still matters, even under pressure
The paper also defends a version of openness. That is notable, because many labs now treat secrecy as the mature stance and openness as a kind of adolescent idealism.
Weston and Foerster do not argue for dumping every model weight onto the internet forever. Their position is closer to managed openness: reproducibility, scientific exchange, and broad participation remain essential, but decisions about what to release should evolve with capabilities and misuse risk.
That stance is harder to meme than total openness or total closure, which is usually a sign that it belongs to reality. Science works because claims can be checked, methods can be challenged, and advances can diffuse beyond the labs with the largest compute budgets. If co-improvement is the right development model, then concentration becomes a real problem. A collaborative future cannot be built entirely behind API walls and red-team embargoes.
There is tension here. Open release can increase misuse. Closed development can concentrate power and reduce accountability. The paper does not solve that tension. It does insist, correctly, that “safety” cannot become a universal solvent for secrecy, especially when secrecy also protects competitive advantage.
The future in this paper is less autonomous and more interdependent
The phrase Weston and Foerster use is “co-superintelligence,” which sounds a bit awkward and is probably unavoidable. The concept underneath it is clearer than the name. They are describing a path where advanced AI expands human capability instead of routing around it.
That idea can sound conservative if you picture humans forever approving each model update with a clipboard. It is not conservative at all. A serious co-improvement regime would change how research teams operate, how institutions make decisions, and how expertise is distributed. People would rely on AI systems for interpretation, simulation, design, and critique at a depth that already strains older categories like “tool” or “assistant.”
What the paper resists is the assumption that progress culminates in human irrelevance. That assumption has always carried a faint smell of theology disguised as engineering. If the goal is better science, better judgment, and safer systems, then the more demanding challenge is not to remove people from the process. It is to build forms of collaboration where the combined system actually thinks better than either side alone.
End of entry.
Published April 2026