Anthropic Built an AI to Interview Workers About AI

Anthropic has done something slightly uncanny and completely logical. It used Claude to interview 1,250 professionals about how AI is changing their work.

That sounds like a stunt until you sit with the shape of it. A company building a model turned that model into a research instrument, then pointed it at the people already living with these tools. The machine became both subject matter and interviewer. Product research folded back on itself.

Most AI coverage still lives at one of two altitudes. Either it stares at benchmark charts, or it zooms out into labor-market prophecy. Anthropic’s “Interviewer” project sits in the messier middle, where people actually work. That is where the interesting signals usually hide: in the awkward habits, the private shortcuts, the parts people do not mention in meetings.

A research method that changes the room

The mechanics matter because this was not a chatbot with a clipboard. Anthropic describes a three-stage process. Claude drafted interview plans based on the researchers’ goals, human researchers reviewed and approved those plans, then participants completed adaptive 10 to 15 minute interviews on Claude.ai. Afterward, the system helped extract themes and quantify patterns across the full set of responses.

That workflow sounds dry. It is not. Traditional qualitative research does not usually happen at this scale because it is slow and expensive. You recruit participants, schedule calls, run interviews, transcribe audio, code themes, compare responses, argue about coding, then do it again. If you want 30 interviews, that is normal. If you want 1,250, you need a serious budget and a lot of patience.

Anthropic compressed that process into something much closer to software. The interviews were still qualitative. People answered open-ended questions in their own words. The difference is that the interviewer could run continuously, adapt in real time, and never get tired or drift off script after the seventeenth conversation of the day.

There is a deeper shift inside that efficiency. When interviews become cheap enough, companies can ask different questions. They do not have to reserve qualitative research for rare, high-stakes moments. They can treat it as infrastructure. That opens the door to continuous listening, which sounds benign until you remember that the listener is also the vendor.

Anthropic published the raw data on Hugging Face, which matters more than the usual transparency boilerplate. Open transcripts let outsiders inspect the method, test the claims, and notice patterns the company may have missed. It does not erase the company’s incentives, but it raises the cost of hand-wavy interpretation.

Scale changes what qualitative research can do

The obvious headline is cost. Running 1,250 interviews with human interviewers would be expensive enough to limit the project before it started. Anthropic got a large sample of rich responses in weeks instead of months.

The more interesting point is that scale changes the function of interviews themselves. In the old model, qualitative work often exists to generate hypotheses that later get tested quantitatively. You talk to a few dozen people, identify recurring themes, then build a survey. With an AI interviewer, that sequence starts to blur. You can ask open questions at something approaching survey scale, while still following up on interesting answers.

That does not make this magical. Good interviewing is not only about asking the next plausible question. It is also about timing, trust, ambiguity, and noticing when someone is circling a truth they are not ready to state cleanly. Machines are getting better at parts of that. They are still brittle around the edges. A skilled human interviewer knows when silence is useful. A system usually fills it.

Even so, the practical gain is real. If you want to understand how teachers, analysts, lawyers, marketers, and engineers are actually fitting AI into their workdays, there is no cheap human-only method that gets you 1,250 nuanced conversations. An AI interviewer gives you a new instrument. The question becomes what kind of reality that instrument captures, and what it distorts.

The most revealing result is social, not technical

Some of the numbers look familiar. Eighty-six percent of respondents said AI saved them time. That will surprise almost nobody who has watched knowledge workers adopt these tools in private, then speak about them carefully in public.

The more revealing numbers sit next to it. Sixty-nine percent mentioned social stigma around using AI at work. Fifty-five percent expressed anxiety about AI’s future impact. Only 8 percent were anxious without any adaptation plan.

That combination tells a richer story than the standard “people fear automation” headline. Most respondents were already getting practical value. Many were still worried. Yet very few were simply frozen. People are not standing outside the wave, debating whether it exists. They are already in it, trying to keep their footing and protect their status at the same time.

One participant, a fact-checker, said a colleague recently complained about AI and they stayed quiet about their own use. That quote lands because it captures the current workplace mood better than any adoption chart. The pressure is not only about efficiency or job loss. It is also reputational. Workers are managing a split identity: publicly careful, privately experimental.

That social stigma matters because it changes the data companies see. If people hide usage from coworkers and managers, internal surveys will miss part of reality. If teams quietly rely on AI while pretending they do not, adoption is happening in a shadow economy of prompts, copy-pastes, and selective disclosure. The software may be new, but the social pattern is old. Tools that threaten a profession’s self-image often get used before they get admitted.

This is one reason Anthropic’s project matters beyond Anthropic. It suggests that the hardest part of AI adoption may not be model quality. It may be collective honesty. A workplace can absorb a lot of technical change if the gains are obvious. It struggles more when the tool creates ambiguity about effort, authorship, and competence.

People undercount how much they automate

The sharpest finding in the study is the gap between how people describe their work and what usage logs suggest. In the interviews, participants framed their AI use as roughly 65 percent augmentation and 35 percent automation. Anthropic compares that with prior analysis of real Claude conversations, which found a split closer to 47 percent augmentation and 49 percent automation.

That gap is not a small statistical curiosity. It points to a basic mismatch between self-perception and behavior. People like to imagine themselves using AI as a collaborator, editor, or thought partner. In practice, a lot of them are offloading chunks of work.

There are innocent reasons for this mismatch. A person may receive a draft from Claude, then heavily revise it outside the chat. The logs will overstate automation because the human effort happened later and somewhere else. Many workflows leak beyond the platform boundary. A model sees the prompt and the answer, then loses sight of the edits, the judgment, and the cleanup.

But there is also a status story here. “I use AI to speed up the boring parts” sounds professionally respectable. “I let AI do half the task” lands differently, even if both descriptions refer to the same workflow. Knowledge work still carries a moral attachment to visible effort. Automation threatens that attachment, especially in fields where the job identity rests on taste, expertise, or craft.

Another possibility is that people discount the importance of tasks they automate. If the model writes the first draft of a customer email, summarizes meeting notes, restructures a spreadsheet, or produces boilerplate code, that may feel like background labor. People may sincerely report that they are mostly augmenting their core work because the automated parts no longer count as the real thing. Anyone who has watched office software disappear into the furniture of daily work will recognize that move.

This is why usage logs and interviews need each other. Logs show behavior without context. Interviews reveal motives and meanings, though imperfectly. If you only trust the logs, you miss the human story. If you only trust the interviews, you risk hearing a flattering narrative.

The interviewer shapes the answer

Anthropic is admirably direct about the methodological awkwardness. Participants knew an AI was interviewing them about AI. That creates “demand characteristics,” the social cues that push respondents toward certain kinds of answers.

You can imagine the distortions in both directions. Some people may feel safer confessing habits to a machine than to a human researcher. There is less fear of judgment, fewer status cues, and no awkward face on Zoom when you admit you let a model draft half your report. Others may perform for the system, either by overstating sophistication or by keeping their answers generic. An AI interviewer can feel oddly intimate and strangely impersonal at the same time.

That does not invalidate the project. It simply means the instrument is part of the phenomenon being measured. In fact, that may be one reason the results are interesting. The study is not only about attitudes toward AI. It is also about what people are willing to say when AI becomes the conversational surface for self-reporting.

There is a quiet possibility here. Machines may be especially good at eliciting certain kinds of workplace truth precisely because they are not colleagues. Plenty of people would rather tell a system, “I use this tool and I hide it,” than tell a manager or a researcher tied to their professional world. We should not romanticize that. Trust in machines is uneven and fragile. Still, the social texture of disclosure changes when the listener has no visible ego.

Product feedback is becoming a live system

Anthropic can present this as social research, and it is. It is also product strategy with a lab coat that mostly fits.

A company that can run large-scale qualitative interviews through its own model gains a powerful feedback loop. It does not have to infer user needs from clicks, retention curves, or support tickets alone. It can ask people how they feel, what they fear, what they hide, and what they wish the tool understood about their work. That is a richer input into design than telemetry by itself.

You can already see the practical uses. If professionals feel stigma around AI use, the product team may build features that emphasize privacy, attribution, or controllable drafting. If people underreport automation, messaging may shift toward “assistance” even when workflows are becoming more automated under the hood. If anxiety is widespread but adaptation is active, companies may focus on making users feel competent rather than merely faster.

This is where the project becomes broader than one company. As models spread through work, the interface is no longer just a place where tasks get done. It is also a place where attitudes can be measured. The tool can ask for your draft, then ask how you felt about using it. It can help you finish the work, then collect the narrative around that work. Software has always generated behavioral data. AI systems can now generate interpretive data at scale.

That raises governance questions even when the intentions are good. A company studying the human impact of its own product can produce valuable insight and flattering theater in the same motion. Open data helps. Independent replication would help more. The central issue is not whether Anthropic is acting in bad faith. It is that the capacity itself is powerful. The firms building these systems are also building the best instruments for hearing how people adapt to them.

The important shift is who gets to hear the adjustment

For years, the story of AI at work has been told from the outside. Economists estimate exposure. executives announce pilots. critics warn about replacement. workers keep making local deals with the tools in front of them, usually in ways nobody fully sees.

Anthropic’s interviewer does not solve that blind spot, but it narrows it. It shows a path toward listening at a scale that used to be reserved for dashboards, not conversations. That is the real novelty. The company is not only observing usage. It is collecting the human commentary that rides alongside usage: the embarrassment, the relief, the rationalization, the improvisation.

That kind of listening will spread because it is too useful not to. The same models that help draft documents, answer support queries, and summarize meetings can also ask follow-up questions and organize the replies into something legible. Once that becomes normal, every major platform will be tempted to turn conversation into a standing sensor for social change.

The challenge is making sure the people doing the adjusting are not merely legible to the companies causing the adjustment. Anthropic’s study hints at a future where workers can be heard in greater detail. It also hints at a future where firms hear first, frame first, and learn first. Those are not the same future, even if they run through the same interface.