LLMs as Mirrors: The Library of Babel and the Illusion of the Answer

Paul Ford asked Claude to sketch the future of consulting. He nudged it mildly bearish. Claude came back with the full treatment: plausible industry curves, corporate winners and losers, even little narrative fragments about McKinsey people watching their world come apart. It was polished enough to feel diagnostic. It also said a great deal about Paul Ford.

That is the part people miss when they talk about large language models as if they were forecasting engines. The system did not reach into 2035 and return with a weather report. It took the framing, mood, assumptions, and vocabulary embedded in the prompt, mixed them with a huge statistical memory of business language, and produced a coherent story that fit the requested shape.

If Ford had asked for the bullish case five minutes later, Claude could have produced one with equal confidence and roughly equal elegance. Ford himself more or less said so. He had the honesty to admit that the answer felt like a mirror for his own anxiety in that moment, not a supernatural glimpse of the future.

That distinction changes how you should use these systems. It changes how much you should trust them. It also changes what kind of mistakes you should fear.

The wrong expectation

A lot of confusion comes from treating every question as if it belonged to the same category. It does not.

Some questions have a single correct answer, even if finding it is annoying. What was our profit this quarter. How many customers churned in October. Which invoice was unpaid. These are boring in the best possible way. You want one number, one row, one definite result. Traditional software shines here because the world has already been forced into structure. The database is the answer space.

Other questions are nothing like that. What happens to consulting when clients can generate slide decks, market maps, and operating models on demand. How should a company position a new product in a crowded market. What will universities look like when every student has a tireless writing assistant. These questions do not hide a single golden fact. They open into a field of plausible interpretations.

Dan Shipper has a useful way to frame the difference. One class of problems is like finding a needle in a haystack. The answer exists. It is rare, but definite. The other is closer to Borges’s Library of Babel, an infinite library containing every possible book. Somewhere in that library there are true books, wise books, prophetic books, and deranged books that merely look convincing from a distance. The hard part is not retrieval. It is navigation.

Language models are far better at the second class than the first. That is also why they are so easy to misuse. We keep approaching them with the emotional expectation of the needle problem while using them in the library problem.

The interface does not help. A blank chat box invites you to ask a question the way you would ask a competent colleague. Human conversation carries a quiet promise. The other person may be biased or confused, but they are at least trying to tell you what they think is true. A language model is doing something stranger. It is assembling a likely continuation that fits your request and its training distribution.

That does not make it useless. It makes it different.

Inside the library

Borges’s image matters because it captures both the power and the danger of generative systems. An infinite library is not valuable because it contains everything. It is maddening because it contains everything. Truth is in there, but so is every seductive distortion that sits two inches away from truth and dresses better.

A language model does not search the world and then report back. It maps your prompt into a space of possible continuations and starts walking. The path it takes is shaped by probabilities, by learned associations, by your wording, by any documents in context, and by the system’s tuning. Give it “mild bearish” and you have already narrowed the aisle. Ask for the future of consulting and you have invited a genre: strategic analysis, sector disruption, market structure, incumbents under pressure, charts if available, melancholy if useful.

That is why these systems can feel uncannily smart while also being slippery. Coherence is not proof of contact with reality. It is proof that the model is good at building local consistency over many sentences. This is a real achievement. It is not the same achievement as knowing what will happen.

You can see the difference most clearly when you ask for scenarios. A good model can generate several futures for consulting that all make sense on their own terms. In one, routine analysis becomes cheap, firms lose margin, and junior roles disappear. In another, AI expands the market by lowering the cost of bespoke work, consultants move up the stack, and clients buy interpretation rather than raw synthesis. Both stories can be defended. Neither becomes true because the prose lands well.

That does not mean every answer is equally good. Some are shallow, some contradict known constraints, some ignore incentives, and some are little more than clichés with a consultant accent. The useful work is judging among plausible stories, not pretending the machine has solved the question for you.

People often ask whether this means LLMs are basically fancy autocomplete. In a technical sense, the phrase is unfairly reductive. In a practical sense, it points in the right direction. Autocomplete sounds trivial because email suggested “best regards” and phone keyboards guessed “meeting.” At scale, prediction over language becomes a machine for producing essays, plans, code, summaries, personas, and strategic narratives. The mechanism is still prediction. The outputs just got expensive-looking.

The mirror hidden in the prompt

Prompts carry more psychology than users realize. They carry priors. They carry fear. They carry aspiration. Sometimes they carry a desired tone that quietly determines the argument before the first sentence appears.

Ask, “What happens to consulting when AI destroys the value of junior analysts,” and the model will build around destruction. Ask, “How does consulting evolve when AI handles first-draft analysis,” and you may get adaptation, not collapse. The words are close. The mental worlds are not.

This is where the mirror metaphor earns its keep. The system reflects you, but not like a flat mirror in a bathroom. It reflects like a curved mirror in a funhouse built by statisticians. Some features are amplified. Some are smoothed. Some are made legible for the first time because they were implicit in your phrasing and became explicit in the answer.

That can be useful. A founder asking for a risk analysis may discover that every prompt assumes scarcity, competition, and cannibalization, which says something about the founder before it says anything about the market. A manager who keeps asking for messaging that sounds “authoritative” may get back drafts that reveal their appetite for certainty more clearly than any self-description would.

The danger appears when the reflected mood returns wearing the costume of external judgment. Your hunch becomes “what the model thinks.” Your anxiety returns as a report. Your institutional bias comes back with headings and subheads, which makes it feel mature.

This matters more in organizations than in private use. Individuals can get spooked by a dramatic answer and move on. A company can paste that answer into a deck, circulate it upward, and let a probabilistic echo harden into strategy. Once a machine-generated narrative has graphs, logos, and a neat executive summary, it becomes surprisingly hard to remember that it began life as a shaped request.

You can already see this in boardrooms and Slack channels. Someone asks for a market outlook, gets a smooth synthesis, and the conversation shifts from “is this the right frame” to “what should we do about it.” The frame sneaks past inspection because the output arrived looking finished.

Chat makes the confusion worse

Paul Ford has argued that one of the biggest mistakes AI companies make is anthropomorphizing these systems. The chat interface is central to that mistake. It gives statistical generation the social texture of dialogue.

This is not just a branding issue. Design choices train habits. When a system says “I think” or “I believe,” many users stop tracking the mechanics under the hood. The words are tiny, but they pull us toward a model of mind. We start attributing judgment, conviction, even temperament. We are apes with language. We take the bait.

Anthropic and OpenAI are hardly alone here. The whole product category leans on conversational ease because it lowers friction. People know how to talk. They do not know how to write formal queries or build retrieval pipelines. Chat was the bridge to mass adoption.

Dan Shipper makes a fair counterpoint. Anthropomorphism is not only a bug. It is also a compatibility layer. Humans have rich instincts for handling messy conversational partners. We know how to clarify, probe, challenge, and ask for another take. If the system felt like a spreadsheet with attitude, fewer people would get value from it.

I think both views are right, which is inconvenient but useful. Human-style interaction helps us access the tool. Human-style interaction also tempts us to grant the tool more authority than it has. We borrow our social skills to use it, then accidentally import our social credulity along with them.

There is another asymmetry that matters. When a person answers you, they usually have some stake in the answer. They may be protecting a reputation, hiding uncertainty, or trying to impress you, but they are still attached to what they say. The model has no stake. It has no lived cost for being wrong, no career risk, no embarrassment, no memory of the damage. It can generate a doomed restructuring memo and sleep beautifully, if sleeping were available.

That absence of stake is easy to forget because the language keeps inviting us into a social frame. The system sounds like someone. It is not someone.

The places where these systems are genuinely strong

Seeing the mirror does not force you into cynicism. It simply tells you where value actually comes from.

These models are excellent at exploring a possibility space. They can help you enumerate scenarios faster than a team staring at a whiteboard. They can surface assumptions hiding inside a strategic plan. They can rewrite an argument from the other side with unnerving fluency. They can compress a pile of documents into usable shape. They can propose ways to structure ambiguity, which is often the first real step in thinking.

They are also good at playing out consequences. If consulting firms lose margin on routine analysis, what happens to hiring ladders, apprenticeship, pricing, and client expectations. If the opposite happens and demand expands, which capabilities become scarce. A model can extend each branch far enough to reveal second-order effects that a rushed human might skip.

That is real intellectual leverage. It is not prophecy.

There is another nuance worth keeping in view. Some tasks that look like library problems can be pushed closer to needle problems with the right scaffolding. If you connect a model to your internal data, constrain its tools, and ask it to retrieve rather than invent, you can get reliable answers about revenue, inventory, support tickets, and contracts. The language model becomes an interface over systems that do have definite answers. In those moments, the useful part is often not the generation itself. It is the bridge between messy human language and structured sources.

People collapse these modes too easily. They ask a freeform model for a hard number, get a plausible guess, and blame the idea of AI. Or they ask a tool-augmented model for strategic judgment, get a polished synthesis, and mistake that synthesis for decision-making. Same interface, very different epistemic terrain.

Competence looks like friction

The best users I know do something slightly annoying on purpose. They introduce friction after the model has made things feel smooth.

If an answer confirms their instinct, they ask for the strongest contrary case. If it gives a market forecast, they ask which assumptions would reverse the conclusion. If it proposes a strategy, they separate the data claims from the interpretation claims and verify the first set independently. If the result sounds too clean, they get suspicious.

You do not need a ritual for this. You need habits.

One habit is prompt variation. Ask the same strategic question three ways and compare the answers. The differences tell you something about the instability of the problem and the hidden commitments in your wording. Another habit is assumption extraction. Ask the model to list the premises it relied on, then inspect those premises like a hostile editor. A third is role shifting. Make it answer as an incumbent, a competitor, a regulator, a buyer, or a skeptical investor. The contradictions are often more valuable than the prose.

This is where the mirror becomes a tool instead of a trap. Once you stop treating the first answer as a verdict, you can use the system to expose your own framing from multiple angles. The object of interest is no longer a single completion. It is the landscape of completions around your question.

That shift also protects against a subtler failure mode: emotional outsourcing. Many people use these systems when they feel stuck, underinformed, or anxious. The smoothness of the response offers relief. It feels like having an advisor who never gets tired. In that state, agreement can feel like intelligence because it lowers internal tension. A little procedural resistance keeps you from confusing comfort with insight.

What this means for the future of professional judgment

Consulting is a revealing example because the industry lives on two different kinds of value at once. Some of its work is structured analysis. Some of it is organized storytelling under uncertainty. The first category is vulnerable to automation in obvious ways. The second is where language models both help and distort.

A lot of executive work has always involved choosing among persuasive narratives about incomplete evidence. LLMs accelerate that process dramatically. They can produce the kind of internally consistent memo that once took a small team and several days. That lowers the cost of analysis-shaped language. It may also lower the average quality of judgment if companies forget that polished language is not the scarce resource.

This is not a consulting problem alone. It shows up anywhere decisions are made through documents. Product strategy, hiring plans, regulatory positioning, M&A theses, internal policy, public messaging. If the machine can generate ten plausible stories before lunch, the bottleneck moves. The scarce skill becomes knowing which story deserves belief, which one deserves testing, and which one simply flatters the room.

That is a more human bottleneck than many people expected. It rewards taste, domain knowledge, memory of previous failures, and a feel for incentives that models still lack in any grounded sense. It also rewards something less glamorous: the willingness to say, “this sounds good, but I do not know if it is true.”

In practice, the organizations that benefit most from LLMs may not be the ones that trust them most. They may be the ones that turn them into pressure-testing devices. A good team can use a model to generate arguments quickly, then tear those arguments apart with evidence and context. A weak team can use the same model to launder existing bias into neutral-sounding text.

The difference is cultural before it is technical.

Seeing the output for what it is

Once you internalize the library metaphor, the emotional texture of using these systems changes. The first answer loses some of its spell. You stop asking, “what does the model think will happen,” and start asking, “what kind of story did this prompt make easy to tell.”

That is a healthier question because it points back toward agency. You can change the prompt. You can inject better evidence. You can compare scenarios. You can decide that some questions need databases, experiments, or interviews instead of generated prose. You can treat the model as a collaborator in exploration without pretending it has earned the right to adjudicate reality.

Paul Ford’s consulting prompt is memorable for exactly this reason. The response looked like analysis, but the important analysis was happening one layer up. The system was not just talking about consulting. It was revealing how readily a capable model can turn a mood into a worldview and hand it back with citations-shaped confidence.

That is the illusion of the answer. The text arrives finished, so it feels settled. In reality, you are still standing in the library, surrounded by shelves of plausible books, holding one volume that happens to fit the question you asked.