AI Can Scale Tutoring. School Is the Hard Part.

In 1984, Benjamin Bloom published a result that still feels faintly impossible. Students who received one-to-one tutoring performed about two standard deviations better than peers in conventional classrooms. The median tutored student looked like a top performer. In percentile terms, Bloom argued that an average student with individual tutoring could reach the level of the 98th percentile in a standard class.

Education has been staring at that number ever since.

The finding was intoxicating and maddening at once. It suggested that huge gains were possible without changing human cognitive limits, because the same children learned far more under different conditions. It also exposed a brutal economic fact: individualized tutoring works partly because it is individualized, and individualized labor does not get cheap just because society wants more of it.

For forty years, that gap between educational possibility and educational affordability has been one of the quiet scandals of modern schooling. We knew a better experience existed. We could not buy enough of it.

A benchmark the internet never solved

People sometimes talk about online education as if it already answered Bloom. It did not. Recorded lectures made instruction cheaper to distribute, but they did not make it personal. Learning management systems organized assignments, deadlines, grades, and discussion boards, which is useful in the same way a pantry is useful. A pantry is not dinner.

Massive open online courses taught us something awkward. Access to content is not the same as support. Millions of people can watch a lesson. Far fewer can persist through confusion, notice their own misconceptions, and recover from them without help. In many subjects, the wall is not information scarcity. The wall is what happens five minutes after you stop understanding.

That is where tutoring changes the shape of learning. A good tutor does not merely explain. They watch for the squint, the hesitation, the wrong analogy forming in the student’s head. They slow down when needed and accelerate when possible. They ask the next question at the exact moment the learner is ready for it. In practice, they turn a generic sequence into a live conversation.

Bloom’s result was never just about attention. It was about feedback loops.

What tutoring actually buys

The reason tutoring works is less mysterious than it first appears. A classroom teacher has to operate on averages. Even excellent teachers do. They have to move a group through material on a schedule, manage behavior, and keep the social fabric of the room intact. The student who is lost can hide. The student who is bored can drift. Both often do.

A tutor collapses that distance. Misunderstanding is surfaced early, before it calcifies into shame or avoidance. Feedback arrives while the thought is still warm. The explanation can be recast using basketball, K-pop, engines, baking, or whatever language the student already inhabits. That matters more than progressive education slogans ever quite captured. People think with what they already know.

There is also a psychological benefit that rarely shows up cleanly in spreadsheets. A tutor keeps the learner engaged in the task long enough to build momentum. School often confuses visible compliance with learning. A student can sit quietly, copy notes, and understand almost nothing. Personalized guidance makes silent disengagement harder to maintain.

Human tutors bring more than adaptation, of course. They bring trust, pressure, warmth, and sometimes the mild terror of disappointing someone who believes in you. Those forces are difficult to model and even harder to scale. Still, the core mechanics of tutoring are surprisingly legible: diagnose, respond, reframe, repeat.

That is exactly the part current AI systems are unusually good at approximating.

The economics finally changed

For the first time, the cost structure that made universal tutoring absurd is beginning to wobble. Large language models can converse indefinitely, explain the same concept in twenty different ways, and stay available at odd hours when actual adults are asleep or sensibly doing anything else. They can generate endless practice, personalize examples, and react to a student’s confusion in real time.

This is why so many people in education technology sound newly evangelical. The old dream was always obvious. Give every learner a patient, adaptive guide. The obstacle was not imagination. It was payroll.

Anthropic’s education team has described the ambition plainly: continuous, individualized tutoring available to anyone, anywhere. If you strip away the product language, the claim is simple. The machine can now play enough of the tutor role to make Bloom’s old benchmark feel reachable at population scale.

There are already concrete examples that feel less like science fiction than sensible classroom design. A teacher asks students about their interests, then uses a model to generate math practice tailored to each one. The underlying algebra stays the same. The wrapping changes. Skateboarding for one student, fashion for another, football analytics for a third. Engagement rises because the task no longer arrives in a dead dialect.

The important point is not that personalization is cute. It is that motivation and comprehension often travel together. A problem written in your language is easier to stay with when it gets difficult.

The machine also makes cheating frictionless

This is where the story stops being clean.

Anthropic has said that roughly 47 percent of student interactions with Claude are “transactional” and show little engagement. Put less politely, many students use the system as an answer dispenser. They paste the assignment, collect the output, and move on. If Bloom’s result depended on the learner doing more of the cognitive work, these interactions do the opposite. The model absorbs the effort the student needed to expend.

That inversion matters. We have built tools capable of supporting practice, reflection, questioning, and revision. Students are often rewarded for using them to skip those activities. It is not a character flaw unique to this generation. It is a rational response to incentives. If school measures submitted work more reliably than understanding, students will optimize for submitted work.

The irony is sharp. The same system that can act like a tutor can also impersonate the student. One moment it is Socrates. The next it is a ghostwriter with excellent formatting.

This is why debates about AI in education often feel weirdly unproductive. One camp says the tools are transformative. Another says they are ruining learning. Both are looking at real behavior. They are just observing different defaults. Left alone, a powerful assistant tends to become a convenience engine. Designed carefully, the same assistant can create more struggle in the useful sense: the kind that builds durable understanding instead of performative suffering.

The difference is product design and institutional design, not model capability.

School contains more than instruction

There is another reason “AI tutor for everyone” does not automatically solve education. School is not only a knowledge transfer system, even if the software industry often talks as if it were.

Students learn in institutions that also teach scheduling, collaboration, conflict, patience, social navigation, and the strange skill of functioning around other people you did not choose. They build peer networks. They encounter authority. They discover what they are good at and what still embarrasses them. Some of that is frustrating and inefficient. Some of it is the point.

A model can help a student master factoring polynomials or understand photosynthesis. It cannot fully replace the social experience of being in a room where your confusion, confidence, boredom, generosity, and insecurity all bounce off other humans. Anyone who has been on a group project knows this can be a mixed blessing. It is still education.

That matters because AI is strongest at the piece of schooling that looked easiest to modularize in the first place: explaining, summarizing, drilling, testing, translating. If those functions become cheap and ambient, institutions will feel pressure to justify the rest of what they do. Some will respond intelligently. Others will retreat into artificial scarcity and paper-based rituals.

You can already see the reflex. Teachers who fear AI-assisted cheating move programming tests onto paper. Students write Python by hand as if the clock stopped somewhere around 1987. The absurdity is obvious, but so is the institutional logic. When the surrounding system has not adapted, regression looks safer than redesign.

The missing layer is not intelligence

The most interesting recent education work around AI is less about smarter models than about constraining them productively. Anthropic’s “Learning Mode” is a good example. Instead of simply answering, the system is nudged to guide the student toward an answer, asking questions, breaking down steps, and refusing to short-circuit the task too quickly.

The striking detail is that students themselves asked for this. Some reportedly used the phrase “brain rot” to describe what happens when the assistant becomes too convenient. That is a more revealing diagnosis than a lot of faculty memos. Students know the difference between completing work and learning. They just inhabit systems where those goals often diverge.

A tool like Learning Mode tries to realign them. Upload the assignment, and the model behaves more like a coach than a subcontractor. It can generate flashcards for review, provide targeted practice, and integrate with classroom platforms such as Canvas. None of this requires a breakthrough in core AI research. It requires taking pedagogy seriously enough to encode it.

That sounds almost disappointingly practical, which is probably a sign it is the right layer to focus on. The frontier model gets headlines. The interface determines behavior.

There is a broader lesson here. Most technological failures in education are not failures of raw capability. They are failures to shape incentives at the moment of use. Give students a frictionless path to outsourced thinking, and many will take it. Give them a system that scaffolds effort, exposes gaps, and rewards process, and you start to recreate the conditions Bloom was pointing toward.

Assessment is the next piece to crack

The really disruptive shift may arrive through assessment rather than tutoring.

Traditional exams are periodic snapshots. They sample what a student can produce under timed, constrained conditions, often on a bad day, in an artificial setting, with all the familiar distortions of stress and test design. They persist because they are legible. Institutions like clear moments of judgment.

AI makes another model plausible: continuous evidence gathering. If a student learns through a conversational system that tracks errors, recoveries, explanations, and repeated misunderstandings over time, you can build a far richer picture of mastery than a midterm ever captured. The test starts to dissolve into the learning process itself.

That possibility is easy to oversell. Continuous assessment raises nasty questions about surveillance, data rights, and how much inference a system should be allowed to make about a learner’s mind. It also depends on trust in the underlying models, which still hallucinate, misread context, and reflect the biases of their training data. A bad examiner who never sleeps is not progress.

Even so, the direction is compelling. A teacher who knows that one student consistently confuses correlation with causation, while another applies the concept correctly in conversation but freezes on formal wording, has a much better basis for intervention than a number at the top of a quiz. Assessment becomes less theatrical and more diagnostic.

If that happens, the institution changes with it. The classroom stops orbiting the exam quite so tightly. Cramming loses some of its power. So does the old game where students learn the shape of the test rather than the structure of the subject.

The bottleneck moved from compute to institutions

It is tempting to read all this as a straightforward technological victory waiting for adoption. The reality is messier. Schools and universities change slowly for reasons that are sometimes frustrating and sometimes wise. Education is one domain where “move fast and break things” translates into breaking years of a child’s development.

At the same time, institutions cannot hide behind slowness forever. The technology is changing on a six-month cadence. Families, students, and employers are already adapting around formal systems. When institutions fail to offer a coherent way to use AI for learning, they do not preserve the old model. They simply drive usage underground, where the worst incentives dominate.

The open question is not whether students will have machine assistance. They already do. The question is whether schools can shape that assistance into something that deepens learning instead of hollowing it out.

That requires a more ambitious response than plagiarism detectors and honor code updates. It means redesigning assignments so that process is visible, not just outputs. It means building classroom workflows where AI can personalize practice without replacing thought. It means training teachers in prompt design, error analysis, and tool limitations, which is a more serious task than dropping a chatbot icon into the school portal and declaring innovation achieved.

For younger students, the challenge gets even more basic. What should an eight-year-old learn in a world where explanation is ambient and external memory is everywhere? Probably more fluency in reading, writing, numeracy, and attention than some futurists want to admit, because those remain the substrate for judging whether the machine makes sense. You cannot outsource discernment if you never built it.

A century-old dream, with new constraints

Bloom’s old puzzle is finally solvable in one narrow sense. We can now approximate personalized tutoring at a scale that would have looked ridiculous a decade ago. The machine is available, patient, adaptable, and cheap enough to spread.

But the hardest part was never only the explanation engine. It was arranging the surrounding system so that explanation becomes learning rather than substitution. That means products that invite effort instead of bypassing it, assessments that reward actual understanding, and institutions willing to redesign themselves without pretending the classroom was perfect before the chatbot arrived.

The seductive story is that AI will democratize elite education by giving everyone a tutor. The truer story is more demanding. AI can supply the tutor-like layer. Whether that becomes better education depends on everything around it: incentives, interfaces, teachers, norms, and the social purposes we still want schools to serve. The breakthrough is real, but it lands in a human system that does not automatically improve just because the software did.