11 min read

Convenience Is Teaching Us to Trust AI Agents Too Early

Sam Altman said he was absolutely certain he would never give Codex full access to his computer. He lasted two hours. Then he turned off manual approvals, and he never turned them back on.

That admission matters because it strips away the usual fiction. We like to imagine risky behavior comes from careless users, impatient managers, or the less technical corners of the market. In this case, one of the people closest to the machinery described the same pattern everyone else is about to live through. The agent seemed reasonable. The approvals became annoying. Convenience won.

That is a product truth, not a character flaw. And it points to a larger problem than one executive toggling a setting. As AI agents get more capable, safety friction will keep eroding in ordinary, incremental ways. Most of the time, nothing visibly bad will happen. That is exactly why the risk is easy to miss.

Convenience beats vigilance

Security systems are often designed around an ideal user who stays alert forever. Real users do not behave like that. They form habits. They stop examining repeated prompts. They optimize for flow, especially when the tool keeps being useful.

Classic software already taught this lesson. Browser warnings became wallpaper. Cookie banners trained people to click anything that restores the page. Permission prompts on phones gradually lost meaning because they appeared too often and at the wrong level of abstraction. The same pattern shows up with agents, except the stakes are higher because the tool is not just requesting access. It is taking actions.

If an agent helps write code, fixes a test, updates a dependency, and cleans up a script without incident, each success reshapes your instinct. The next approval request feels less like protection and more like a tax. You are not choosing recklessness. You are reacting to a system that has trained you to think your oversight adds very little value.

That is why the Altman anecdote lands so hard. It is not a confession about weak discipline. It is a signal that manual review does not scale as the primary defense once the agent crosses a certain usefulness threshold. If a highly informed user finds the friction intolerable after two hours, most users will not last a morning.

Rare failures teach the wrong lesson

The dangerous cases are not the ones where agents fail constantly. If they broke every fifth action, nobody would trust them. The more unstable systems often protect us by being obviously unstable. The real danger sits in the systems that work almost all the time.

Humans are bad at reasoning about rare, high-impact failure. We treat a long run of normal outcomes as evidence that the system is safe in a deeper sense. With agents, that habit becomes especially misleading because the visible surface is so smooth. The model writes plausible code, produces a coherent plan, explains its own choices, and usually lands somewhere useful. Each successful interaction builds emotional credit.

Then one edge case punches through. The agent introduces a subtle vulnerability into production code. It modifies a cloud configuration in a way that quietly widens access. It follows instructions embedded in a malicious document and starts pulling sensitive data into the wrong place. It misreads a dependency update and ships a backdoor through your own automation pipeline. These are not cinematic failures. They are the kind that can sit undetected for days or weeks.

The rarity of those failures creates a perverse learning loop. Because the agent usually behaves well, the user becomes more willing to relax controls. Relaxed controls expand the blast radius when the edge case finally arrives. Success, in other words, becomes the mechanism that prepares the failure.

Average performance does not settle the question that matters. A system can be 99 percent helpful and still be unacceptable if the remaining 1 percent has wide authority and poor visibility.

Trust lags behind capability

There is another wrinkle, and it is newer. The agent you learned to trust last month is not the same agent you are using today.

Traditional software changes too, but usually in legible ways. You notice a new menu, a redesigned workflow, a patch note that tells you what moved. Large models evolve on a different axis. They get better at planning, tool use, persistence, and persuasion, while remaining hard to interpret internally. From the user’s point of view, the interface may look almost unchanged. The capability profile underneath has shifted.

That creates a calibration problem. People grant trust based on prior experience. They watched the previous model operate inside a certain envelope and concluded that the envelope was stable. Then the model gets upgraded. It now handles longer chains of tasks, finds more paths around constraints, and improvises more effectively when a tool fails. The user’s permissions often stay the same, but the meaning of those permissions has changed.

Think about a code agent that began as a sharp autocomplete with shell access behind approvals. Over time, it becomes something closer to a junior engineer with initiative. It can search the repository, run tests, edit configuration, open pull requests, and maybe touch infrastructure through connected tools. If your trust was formed when it behaved like a cautious assistant, you are now extending old confidence to a different kind of actor.

This is where the pleasant fiction of “I know how it behaves” starts to crack. What most users know is not the system’s behavior in principle. They know a memory of how it felt to use a prior version under familiar conditions. That is a very thin basis for expanding autonomy.

The pleasure is part of the risk

A lot of discussion around AI adoption still frames it as pressure. Teams adopt these tools because competitors are moving faster, leaders want productivity gains, and nobody wants to look obsolete. That pressure is real, but it misses something important.

People also enjoy using agents.

The appeal is not abstract. It is tactile. You hand over a boring task and get back momentum. You ask for a tedious refactor and watch it happen while you think about something else. You stop babysitting the machinery and start feeling like the computer is finally pulling its weight. Once you have that experience, going back to constant approvals feels strangely archaic, like asking someone to fax a verification code before every command.

This is why the erosion of safeguards will not look like a dramatic policy fight. It will often look like users voluntarily removing annoyances from a workflow they love. The danger arrives wrapped in relief. That makes it more powerful than simple top-down pressure, because the impulse comes from the user’s own sense of fluency.

Security teams are used to people bypassing rules under deadline stress. They are less prepared for people bypassing rules because the unrestricted version is genuinely delightful. Delight changes behavior faster than fear does. It rewires what counts as acceptable friction.

The control layer is missing

Most organizations still do not have a proper security architecture for semi-autonomous agents. They have pieces of one. They have identity systems, access management, sandboxing, logging, and approval workflows. Those controls were mostly built for humans and conventional services, not for software that can reason across tools, interpret ambiguous instructions, and pursue a goal over time.

That mismatch shows up quickly in practice. Agents accumulate permissions because narrow scopes feel constraining and broad scopes make demos look magical. Credentials get issued without a clean lifecycle for review and revocation. Temporary access becomes semi-permanent. Sandboxes exist, but the connections around them leak just enough authority to be dangerous. Audit logs capture that an action happened, while the deeper chain of reasoning that produced it remains fuzzy.

The security community has started naming these patterns more clearly. OWASP’s work on agentic AI highlights prompt injection, excessive agency, insecure output handling, memory poisoning, and weak separation between instruction sources. McKinsey’s more operational guidance points toward layered controls, strong human governance for high-impact actions, and policies that follow the agent across tools. Gartner’s adoption numbers tell the other half of the story: deployment is accelerating faster than these control planes are maturing. In 2025, almost half of organizations reported using AI agents in production, up sharply from two years earlier.

That gap matters. When adoption outruns instrumentation, organizations end up flying by anecdote. If the visible incidents are sparse, leaders conclude the risk is manageable. In reality, they may simply lack the telemetry to see the near misses.

Manual approvals are not a safety strategy

This is the practical lesson builders need to absorb. If your safety model depends on users maintaining a high level of vigilance across a stream of low-risk actions, you do not have a durable safety model. You have a hope that friction will be tolerated longer than history suggests.

The better design goal is not maximal permissionlessness. It is meaningful control at the right points. Users should not have to approve every shell command if the commands are low impact and well contained. They should have strong, understandable control over irreversible actions, sensitive data movement, credential use, infrastructure changes, and any step that materially expands the agent’s authority.

That sounds obvious, but it pushes product design in a very different direction. It means approvals must be sparse, legible, and tied to consequences users can actually evaluate. It means permissions should decay over time instead of quietly accumulating. It means model upgrades may need to trigger trust resets or narrower scopes until the new behavior is understood. It means logs should be replayable enough that a security team can reconstruct what the agent did without reading tea leaves.

The inverse Altman test is useful here. If one of the most informed people in the field disabled your approval flow almost immediately, assume your users will do the same. Design for the tired user, the distracted user, the user who has been rewarded for clicking through nineteen correct actions in a row. That is the real user population.

Secure autonomy has to feel usable

There is a temptation to frame this as a simple tradeoff between safety and speed. In practice, that framing is too crude. Bad safety creates the conditions for bypass. Good safety shapes behavior without asking for heroic attention.

A well-designed agent environment should feel less like a stream of pop-ups and more like a set of well-built guardrails on a mountain road. You still move quickly. You are not asked to inspect every bolt in the barrier. But the boundaries are real, and they were designed for the moment when something goes wrong, not the long stretch where everything feels under control.

That means the future work is not only about stronger models or stricter policies. It is about infrastructure. Better sandboxing. Better identity and delegation models for non-human actors. Better provenance for instructions and memory. Better ways to express user intent at a high level, so the agent can act freely within a bounded contract instead of requesting permission for every tiny motion.

There is also an uncomfortable implication for product teams. Some of the most compelling experiences will need to be constrained in ways that reduce short-term wow factor. The demo where the agent has broad access to everything is often the easiest one to sell. It is also the one most likely to normalize bad habits that later become expensive to unwind.

The drift is ordinary, which makes it dangerous

The industry likes to imagine pivotal moments as visible and dramatic. In reality, many important failures begin as routine convenience. A few prompts get dismissed. A setting gets relaxed. A permission scope widens because it saves time. A model improves, everyone notices the upside, and nobody revisits the assumptions attached to its access.

That is what makes the current moment easy to underestimate. People are not making a grand declaration that autonomous agents deserve sweeping trust. They are absorbing a series of small frictions into muscle memory and letting the machine do a little more each week. The path feels smooth right up until it doesn’t.

The builders who treat that smoothness as part of the security problem, rather than a separate product concern, will make the systems people can actually live with. Everyone else will keep discovering the same lesson Altman described in miniature: if convenience keeps beating vigilance, then security has to survive being inconveniently ignored.

End of entry.

Published April 2026