15 min read

Biosafety Already Taught Us the Failure Modes of AI Safety

The most useful lesson for AI safety did not come from machine learning.

It came from biology, where researchers have been arguing about powerful tools, voluntary restraint, and catastrophic misuse for decades. George Church, one of synthetic biology's most visible figures, has spent years warning that good intentions would not be enough. Looking back on the field's attempts to govern itself, he put it plainly: “One of the things I advocated in 2004 is that we stop deluding ourselves into thinking that moratorium and voluntary signups to be good citizens is going to be sufficient.”

That sentence should hang over every summit, safety pledge, and glossy frontier-lab announcement. Biology already tested the idea that norms plus signatures could contain a technology racing ahead of institutions. The result was not zero progress toward danger. The result was He Jiankui.

In 2018, He announced that he had edited human embryos that were later brought to term, producing children with altered genomes. This was not a surprise in the deep sense. It was a breach, but not an unimaginable one. The technical path existed. The incentives existed. The oversight was soft. Church's verdict was unsentimental: “That was clearly a failure of the whole moratorium and voluntary and whistleblower components.”

The reason this matters for AI is not that biology and machine learning are identical. They are not. One manipulates cells, organisms, and wet labs. The other manipulates code, compute, and increasingly the behavior of people at scale. But the governance problem rhymes almost line for line. When capabilities spread faster than enforcement, appeals to responsibility become a kind of theater. Sometimes useful theater, because norms do shape behavior at the margins. Still theater.

Moratoriums signal concern more than they impose control

A moratorium can be valuable. It can buy time, create common language, and make reputational boundaries visible. It can tell the careful majority that a line exists. What it cannot do, by itself, is stop the ambitious minority from crossing it.

That was true in gene editing. The international debate around germline modification produced statements, summits, and a broad sense that making heritable edits in humans was premature and dangerous. For several years, that consensus held reasonably well. Church even noted that the moratorium “worked for five years with only one defector. That's quite impressive.”

He is right. In ordinary policy discussions, one defector in five years might sound like success. In catastrophic domains, the math is harsher. If the act in question can permanently alter human germlines, “almost nobody did it” is a thinner comfort than it sounds. The relevant question is not whether most actors complied. The relevant question is whether one actor could create irreversible consequences before detection or intervention. He proved the answer was yes.

This is the part many AI discussions still evade. Voluntary commitments are often framed as if they were control systems. They are not. They are declarations by actors who currently believe restraint is in their interest, morally or strategically or reputationally. The minute that belief weakens, the commitment weakens with it.

You can already see the pattern. Frontier labs publish responsible scaling policies, red-teaming frameworks, and deployment principles. Some of this work is serious and useful. Some of it is mostly a public signal to governments and customers that the adults are in the room. The underlying problem remains. If there is no robust monitoring, no credible penalty, and no independent authority capable of slowing deployment, the promise is conditional in the most important way. It holds only while the promisor wants it to hold.

That makes moratoriums and pledges less like guardrails and more like lane markings. Helpful when everyone agrees to stay inside them. Decorative when somebody decides otherwise.

Offense gets cheaper, subtler, and harder to spot

Church's next observation is even more unsettling because it is structural, not cultural. “Offense awfully does have an advantage,” he said. In biology, the offensive side keeps benefiting from tools that are cheaper, smaller, and less visible. His phrasing is precise: technology enables “smaller and smaller efforts that are harder and harder to detect, more and more subtle to the stochastic variation between people.”

That line captures the whole problem in one sweep. As capabilities mature, harmful acts require less infrastructure and blend more easily into background noise. A clandestine nuclear weapons program leaves a large industrial footprint. Gene editing does not. An engineered pathogen can emerge from a much smaller operational surface than a missile program. Detection becomes more like searching for a manipulated sentence in a library than spotting a tank column from space.

AI has the same drift. The frontier models still require enormous training runs, giant data centers, and power bills that look like municipal planning problems. But the dangerous actions these systems can enable or automate do not always require building the next largest model. A malicious actor might fine-tune an existing model, chain tools together, or use open weights with domain-specific data. The barrier between “state-scale capability” and “small-group capability” keeps thinning.

The offense advantage in AI also has a different flavor from biology, and that difference may make it worse. Biological attacks still have to move through the friction of the physical world. Samples, delivery mechanisms, symptoms, hospital systems, sequencing pipelines. AI attacks can travel at network speed. A model that improves phishing, malware generation, social engineering, scientific search, or autonomous vulnerability discovery can be copied, hidden, and reused quickly. The same architecture that helps a researcher automate tedious work can help an attacker industrialize a campaign that used to require a team.

This is why shallow comparisons to nuclear deterrence keep missing the point. Nuclear weapons are terrifying, but they are hard to build, hard to hide at scale, and tied to state capacity. Biology and AI are drifting in the other direction. Capability is becoming more distributed. Tooling is becoming more user-friendly. The expertise floor is lowering. Offense does not need to become easy in some cartoon sense. It only needs to become easier faster than defense becomes reliable.

And defense is usually asked to do more. The defender must monitor many pathways, distinguish signal from noise, absorb false positives, and keep institutions functional under stress. The attacker needs one workable route.

Catastrophic systems are dominated by the outlier

There is a sentence from Church that should probably be tattooed somewhere in every policy office dealing with frontier technologies: “All it takes is one, one group probably, or one person.”

This is the core asymmetry. In many areas of life, averages tell you a lot. If 99.9 percent of bridges hold, the system is mostly fine. If 99.9 percent of commercial flights land safely, the system is extraordinarily safe. In catastrophic risk domains, averages can hide the whole story. One release, one genome edit, one exploit chain, one model deployment in the wrong hands can matter more than a thousand compliant actors.

That changes what counts as adequate governance. It is not enough to say that the large companies are responsible, the leading labs care about safety, or the scientific community broadly agrees on best practices. Those facts may all be true and still leave the main vulnerability untouched. The tail risk sits with the actor who is less constrained, less visible, or simply more reckless.

Biology makes this painfully obvious because public health is collective defense. A pathogen does not care whether most labs followed the protocol. The system is exposed to the weakest relevant point. Human germline editing is similar in a different way. Once a child is born with an intentionally edited genome, the event has already happened. The social process can condemn it after the fact, but condemnation is not reversal.

AI has its own versions of this one-defector logic. A single group could release a model with dangerous capabilities and weak safeguards into an ecosystem that cannot pull it back. A single state or criminal network could combine model autonomy, cyber tooling, and targeted biological knowledge in ways that move much faster than any regulator can respond. A single actor could also create a norm-shattering precedent. Once something has been shown to be doable and survivable, imitation gets easier.

People sometimes hear this and conclude that if perfect enforcement is impossible, regulation is futile. That is the wrong lesson. The real lesson is that systems exposed to tail risk need thicker layers than reputation and etiquette. Aviation did not become safer because pilots promised to care more. Biosafety did not improve because researchers discovered a new fondness for paperwork. Risk fell where institutions created inspection, accountability, and technical barriers that made failure harder to initiate and easier to catch.

Competition strips away the soft parts of safety

There is another line from Church that lands with uncomfortable precision in today's AI environment: “What typically happens when there's an intense competition is those safety rules get undermined and pushed aside.”

This is not a character flaw unique to scientists or founders. It is a property of races. If the prize appears large enough, every safeguard starts getting reframed as delay, and delay starts looking like defeat. The rhetoric shifts first. People still say safety matters, but now it must be “balanced” against innovation. Then the internal incentives shift. Launch dates pull harder than evaluation results. Access expands before monitoring is ready. Risk arguments get treated like negotiation tactics.

Biology has lived through this dynamic repeatedly. Academic prestige, venture funding, national ambition, and therapeutic hopes all push toward moving faster. Sometimes those pushes are justified. A field can be slowed by fear as well as by recklessness. But intense competition has a reliable side effect: it turns voluntary safety into a tax that each actor hopes others will continue paying.

That is exactly why so many current AI debates feel thin. They focus on the sincerity of leaders rather than the structure around them. A sincere executive is better than a cynical one, just as a careful scientist is better than a careless one. Yet if the market rewards faster deployment, and if geopolitical narratives reward visible capability gains, sincerity becomes fragile. People rationalize. They tell themselves the next release is manageable, the next scaling step is still inside the envelope, the next shortcut is temporary. By the time the field notices the safety margin has been consumed, the incentives that consumed it are entrenched.

This is also where public discussion gets distorted by a false image of heroism. We like stories where restraint depends on enlightened individuals choosing wisely under pressure. Real systems should not ask for that level of personal virtue every quarter. They should make the safer action legible, enforceable, and less punishing.

The lesson from biosafety is not that competition always destroys governance. It is that governance weak enough to depend on unanimous restraint will predictably bend under competition.

Safety becomes real when someone can see, report, and punish

Church's preferred response is refreshingly unromantic. He points to surveillance, consequences, and “mechanisms for whistleblowers to make it easy for people to report things that they think are out of line.” That is not a slogan. It is an institutional design brief.

Start with surveillance. A rule without visibility is mostly aspiration. In biology, the practical analogues include screening DNA synthesis orders, monitoring unusual pathogen work, tracking high-risk materials, inspecting facilities, and building public-health detection systems that can spot anomalies early. None of these measures is flawless. They create friction, not perfection. Friction matters because many dangerous actions are opportunistic. Raise the odds of detection, and you reshape behavior before you ever impose a penalty.

The AI equivalent is not mysterious, even if it is politically awkward. It means monitoring large training runs, requiring incident reporting, auditing evaluations tied to dangerous capabilities, controlling access to high-end compute for sensitive projects, and building logging and provenance systems that make misuse easier to trace. Some of this already exists in fragments inside companies. The question is whether it remains internal and discretionary or becomes standardized and externally reviewable.

Consequences matter for the same reason. A pledge with no sanction is branding. In biosafety, professional censure matters a little, criminal liability matters more, and funding restrictions matter a great deal. He Jiankui did not merely face angry op-eds. He was prosecuted and imprisoned under Chinese law. That did not undo the event, but it did show that society can attach more than reputational cost to a boundary crossing.

AI still lives in a much softer enforcement world. The largest penalties tend to arrive after consumer harm, privacy violations, or antitrust concerns. Capability-related recklessness remains harder to define and therefore easier to excuse. If policymakers want safety commitments to function as more than polished PDFs, penalties for concealment, negligent deployment, and deliberate evasion have to be specific enough that general counsels can read them without squinting.

Whistleblowing is the part that sounds procedural until you need it. Many dangerous programs are visible first to people inside them: researchers who notice a skipped evaluation, engineers pressured to disable safeguards, compliance staff who see a gap between public claims and internal practice. If reporting means career suicide, the signal dies early. If reporting is protected, confidential, and attached to an authority that can act, institutional reality changes. People stop feeling like the only alternative to silence is martyrdom.

None of this is glamorous. That is one reason it is often postponed in favor of declarations about principles. Principles photograph better. Monitoring systems are ugly, expensive, and annoying to the people being monitored. They are also what mature societies build when they have stopped pretending that goodwill scales on its own.

The ethics problem is deeper than alignment marketing admits

Church also puts his finger on a more philosophical issue, and it deserves more attention than it gets: “I don't think we understand our own ethics well enough to educate a completely foreign type of intelligence. We barely know how to pass it onto the next generation of humans.”

That line cuts through a lot of inflated talk about alignment. It does not mean the alignment project is pointless. It means the easy version of the story is fiction. Humans do not possess a clean, settled, portable ethical package waiting to be uploaded into advanced systems. We have contested values, institutional compromises, cultural variation, and many examples of behaving worse than our stated principles. Turning that mess into machine behavior is not impossible, but it is not a software ticket either.

Biology has an analogue here too. Debates about embryo editing were never just technical. They exposed disagreement about disability, consent, reproductive freedom, enhancement, inequality, and the authority of science itself. A field can know a great deal about what it can do while remaining deeply divided about what it should do.

AI safety often collapses these layers. One conversation concerns misuse: helping bad actors do harmful things. Another concerns misalignment in the stronger sense: systems pursuing goals in ways humans did not intend or cannot control. A third concerns social power: who gets to define acceptable behavior, acceptable speech, acceptable risk. These are related, but they are not the same problem, and pretending otherwise leads to shallow fixes.

The biosafety comparison is useful because it reminds us that technical containment and ethical legitimacy must develop together. If a field only builds capability controls without public legitimacy, trust erodes. If it only talks about values without building enforceable controls, danger compounds. That dual requirement is inconvenient, which is why many institutions try to pick one half and market it as the whole solution.

Time is not an external force

The most striking thing in Church's remarks may be his calm rejection of urgency. On embryo editing, he has argued there is no need to rush, calling it “a completely artificial emergency.” That phrase travels well beyond biology.

Many technology races dress themselves in the language of inevitability. Progress is coming anyway. Someone else will do it. The nation must lead. The market will not wait. Sometimes those claims describe real pressure. Often they function as a solvent, dissolving the burden of proof for caution. The schedule begins to look like physics rather than choice.

But timing is policy. Fields speed up because investors, states, firms, and cultures reward speed. They slow down when those same institutions decide some capacities should arrive with more testing, narrower access, and clearer accountability. There is nothing natural about treating every possible capability gain as urgent. That is a social decision wearing a lab coat.

This may be the most direct biosafety lesson for AI. The danger is not only that one malicious actor exists, or that offense has an edge, or that voluntary rules leak under pressure. The danger is that institutions know all this and still organize themselves around acceleration. If you combine asymmetric risk, weak enforcement, and a race mindset, you should not be surprised when safety language becomes decorative.

Biology has not solved its governance problem. Synthetic biology remains powerful, distributed, and hard to police. Yet it has at least forced a sobering recognition: a field handling catastrophic potential cannot rely on manners, consensus statements, and the hope that nobody ambitious decides to become historically significant in the worst possible way.

AI is walking into the same terrain with even stronger incentives to move fast and even fewer settled institutions around it. The parallel matters because it strips away a comforting fantasy. We are not waiting to discover whether voluntary restraint can carry the load. Another field already ran that experiment, and the result was clear enough that ignoring it now would be a choice rather than an oversight.

End of entry.

Published April 2026