If AI Models Have No Moat, What Are Investors Buying? with Benedict Evans

Leong Chung Wei Bernard 25.Jun.2026

Benedict Evans argues AI models are becoming commodity infrastructure, and that durable value depends on products and use cases — not better models or moats.

Fresh out of the studio, Benedict Evans, independent technology analyst and author of AI Eats the World, returns to explore whether the AI model layer is becoming commodity infrastructure. Benedict argues there is no winner-takes-all effect in models yet, drawing parallels to telecoms, cloud, chips and the fiber bubble to ask where durable value actually accrues when everyone runs similar infrastructure on similar tokens. He unpacks why the chatbot remains a poor interface, introduces the "blank screen" and "jagged frontier" problems that keep software companies alive, and explains why large language models inherently give you "the average." Closing the conversation, Benedict reflects on the indicators that would show AI has truly eaten the world — and why the answer is better products, not better models.

"When you automate away work, you can always see the jobs that are going away because they're right there. And you don't know what the new jobs are going to be. Human needs are infinite. How many people are earning a living from making podcasts now? Imagine predicting that 10 years ago. There's a stage in the evolution of the market where like if you're still arguing about that, you're an idiot. But there's a stage at the beginning where you might have opinions about some of these questions, you're probably not even asking the right questions. That, I think, is where we are with this stuff today." — Benedict Evans

Profile: Benedict Evans, Independent Technology Analyst (LinkedIn, Personal Blog, Newsletter)

Here is the edited transcript of our conversation:

Bernard Leong: Welcome to Analyse Podcast, the premium podcast dedicated to dissecting the pulse of business, technology and media globally. I'm Bernard Leong. The AI cycle is no longer just about better models. It is becoming a capital markets test of infrastructure, distribution and energy. Anthropic and OpenAI are moving towards public markets. SpaceX is preparing one of the most consequential listings in technology, and the launch this morning of Claude Fable-5 (Note that: Fable-5 is no longer available now) shows how frontier capability now comes bundled with safety, access and governance questions.

These headlines point to the same deeper story: massive CapEx, increasingly commoditized models, shallow but enormous usage, and a new battle over who controls the curation layer over infinite intelligence. Today we test whether the model layer becomes a utility, where durable value actually accrues, and how AI reshapes work and competition globally. With me today, recurring guest Benedict Evans, the independent technology analyst behind AI Eats the World. Benedict, welcome back.

Benedict Evans: Thanks for having me.

Bernard Leong: Since our last conversation, what have you been up to recently?

Benedict Evans: Been on lots of airplanes. People want to know about this AI thing. You may have heard about it.

Bernard Leong: I want to go straight to the question. Early this year you started updating your presentation every six months, and AI Eats the World is now in its 2026 edition. So I want to start with the biggest question: OpenAI, Anthropic, and the public market test. One article I really enjoyed, since I'm a subscriber to your newsletter, was on how the frontier models are near commoditization with no network effects. Everybody is now trying to buy compute time, and that is not a moat. Both OpenAI and Anthropic are heading towards public market capitalizations near a trillion dollars. So what is the public investor actually buying, if not a moat?

Benedict Evans: Well, there are several different questions embedded in that. The first is that at the moment there's no apparent winner-takes-all effect in models. This is a subtle but important distinction. Are there things you're doing that no one else can do, no matter how hard they try, that they won't be able to catch up on no matter how much money they spend? That's where Google is in search. It doesn't matter how much money Microsoft spent, they can't catch up. That's what happened with YouTube, with iOS, with Windows, with Instagram. There are inherent structural reasons why it's really difficult for them to lose their position.

We don't see equivalents of those in large language models yet. The big challenge, of course, is that we don't know how this market is going to evolve. Some people have a problem with that statement too, but you're at the very early stage of this market. You don't know how it's going to evolve. We also don't know how the science works, and we don't know how the science will change. So things may happen that mean there are network effects, but right now there aren't. You're in a situation where you've got three to six companies spending a lot of money, and then maybe up to a dozen companies willing to be three to six months behind. But there's no mechanic that means if one of them gets ahead, they'll get further ahead.

What we see now is that Anthropic got coding working with product market fit. There's no inherent reason why it's impossible for Google and OpenAI to catch up with that and get their own things working. They may do. They may fail to execute. But you can't predict that. The models are all kind of the same, with the same benchmark scores and the same evals, because they're using the same infrastructure, the same training data, the same algorithm. That's what you'd expect.

So then you get to a question. Firstly, would there be mechanisms — for example, a use of capital — that would allow one of them to pull ahead? Secondly, how far up the stack do they go? Is the model the whole experience? Is the model what we use? Does the chatbot become the universal user interface, which is a question we've been asking for three years? Or does mainstream mass-market adoption — not using it once a week or a couple of times a month, but using it all the time, every day — need this to be wrapped in apps, use cases, tooling, go-to-market, data and everything else we think of when we see software? Your phone runs 50 different databases every day, but they're all different. They run on SQL or whatever, but that's not the point. That's not what the product is.

So how far up the stack can the models go? Or do the models have to be APIs used by software built by other people? In the end, if there are 500 of these things, they can't all be built by Anthropic and OpenAI. Not because you can't get the model to write the code, but because you've got to build the business, work out what it would be, build a Salesforce, do go-to-market, and do all the other things that happen after you've written the code. I don't know why they would be able to do that any more than Microsoft could, or Apple could.

That gets you to a preliminary thesis: models are a crappy UX. Chat is a crappy UX. Models are commodities. There need to be loads of apps. The model company can't build all of those. So the models look like they're going to end up as commodity infrastructure.

Now you get a bunch of questions. To your point about pricing, clearly right now we're in a moment of extreme pricing crunch. But all the multipliers in that are going to get flung all over the place in the next year. You've got a trillion dollars of CapEx coming down the pipe. The models get fifty, a hundred, two hundred times more efficient. The chips keep getting better. You don't know what the next model will be. The next models, on past history, use way more tokens. The only thing that's really, really got product market fit is coding. Imagine if we had something with product market fit that lots of people wanted, rather than coding, which is in the end tens of millions of people, not hundreds of millions of people. So all those levers are going to get swung all over the place.

Where might it settle out in the end? You can throw analogies out there. What happened in telecoms — in mobile — is that mobile has marginal cost. A lot of people don't get this. Mobile networks have marginal cost. More users using it more means you have to build more network, and that costs money. So basically we went to paying fifty or a hundred dollars a month, every month, for twenty years. Not quite as simple as that, but we pay roughly the same amount of money, and we now use a thousand to two thousand times more data. It's a trillion-dollar annual revenue industry that pays two hundred billion dollars a year in CapEx, and the stocks have gone nowhere in twenty years.

So it's a very big industry that spends a lot of money on CapEx and earns a lot of money, but there isn't really any profit. There's not much profit, and it's not a great investment as an investor, because all the value went up the stack. All the cool stuff was built by other people.

Now there are other comparisons you can make. You can point to cloud, where there's a pretty good return, and there's differentiation between Google, Microsoft and Amazon. But there again there's no network effect.

Bernard Leong: Obviously there too, all the businesses are using everyone [every AI Model].

Benedict Evans: You run on AWS, but AWS doesn't get a percentage of your Uber trip. They don't get that much value or leverage further up the stack. They're an essential component. You could point to chips, and the value of the chip comparison is that it gets more expensive within a generation. I don't mean Nvidia. I mean TSMC, Samsung — semiconductor manufacturing. Because of Rock's Law it gets more expensive with every generation. So now there are basically three chip companies — or two, or one, however you want to count it — as opposed to thirty or forty years ago when there were dozens. Because with each generation it gets progressively harder.

So it might look like telecoms, it might look like cloud, it might look like semis. The fiber comparison is interesting. At first sight the fiber comparison is dumb, because the fiber bubble twenty-five years ago was built out way ahead of demand.

Bernard Leong: Nobody was using it [fibre] then.

Benedict Evans: There was all this empty fiber, whereas this is being built out behind demand. You can sell every token you can make. However, the interesting part of the fiber comparison is the massive disaggregation. What happened with fiber was that it got split up: you had ducts, dark fiber, lit fiber, wavelength, IP streams and probably some other stuff. Then you had people trading bandwidth on top, and there's always somebody with slightly newer equipment with slightly lower costs. So everyone ends up selling it at marginal cost or less.

Bernard Leong: It feels like tokens to fiber.

Benedict Evans: That's the question. The more the market gets disaggregated — you've got Mark Zuckerberg saying, if I've got spare capacity I'll sell that, and people trying to trade it and building ETFs on top of it — the more the market gets fragmented. In parallel there's this other question of how long you need the newest model. I'm being slightly unsystematic in how I'm explaining this, but how long do you have to have the newest full-fat model, and at what point, and how many use cases, can you use a six-month-old model, or a three-month-old model, or the open-source model? Or run it much more slowly, or even run it on your own equipment? How many use cases have to have the full intelligence, the full model running in the cloud at the massive price, using lots of tokens?

All of which is to say, it's like the bell-curve IQ meme — the person at 50 IQ and the person at 150 IQ agree: commodities tend not to be high-margin products, and compute tends to get cheaper over time. Then you can sit in the middle trying to analyze how many TPUs Google has this year. I'm being slightly unfair, because that's the hard thing to do. But what does that actually tell you?

There's a story that occurred to me. I'm sorry, I'm monologuing. When I was a baby technology analyst back in 1999, there was a company that sold computer components. Chips, memory, keyboards, monitors, printers, all of this, sold online. It had been converted from one of those companies that had twenty pages in the back of a computer magazine, if you remember.

Bernard Leong: The catalog.

Benedict Evans: The back of a computer magazine in the nineties was a 300-page magazine, and the back 150 pages were one insert after another — twenty, thirty, forty pages from different resellers, basically selling the same stuff. You'd page back and forth and say, they've got the printer I want for $260, and they're selling it for $250. Anyway, they make a website. They're doing — I'm making this up — a hundred million a year in revenue. We all go up to see the company, and an investment banker got a mandate to float it at seven or eight hundred million.

All the way back we're discussing it, and this veteran banker says: it's a low-margin reseller, one-time sales. You can spend a lot of time talking about Amazon and the internet and consumer adoption and broadband and HTML and returns and pricing and the management team. But it's a low-margin reseller, one-time sales. That's the thesis.

The question I'm asking is: explain to me why people who are using similar infrastructure, similar models, similar tokens to deliver a similar product — and there are going to be three to six of them, and it's going to get fragmented with better and faster and newer and older and cheaper and more expensive — why is there pricing power here?

Bernard Leong: That's the point. I really come to your point on that. You're also very clear that the only UI, even with agentic AI happening this year, is still the chatbot interface. It's still not much change.

Benedict Evans: It still doesn't change. There's another strand worth pulling on here. If you think about the classic S-curve framing of how technology evolves — the way it worked with mobile, the way it worked with the internet — there's a period at the beginning where it's flat and not working, and no one cares, and it's dumb and stupid. Then there's a period where everything starts working and the curve shoots up, and it's really exciting. Then there's a period where it's all happened and it's been done, and it flattens out and gets boring. That's where smartphones are now. It's where airliners are now. It's where cars were until electric happened. There's not much difference between this year's model and the model from five years ago. Everything's been done, and it's become a mature, boring product.

The reason I mention this: when you're at the beginning of that phase, it's very unclear how this is going to work, and there are many potential paths. Take a car in 1905. You need to choose the spark plugs, and you drive along with a bundle of fifty tires on the back because you get a puncture every three miles, and you generally have a mechanic with you — if you're rich enough to own a car, you're rich enough to have a mechanic on staff, and you need it because the car breaks down every twenty minutes. It's incredibly exciting, but nothing works. Same with PCs in the late seventies and early eighties. Incredibly exciting. If you were using one then, you thought, this is great — what do you mean it doesn't work? The normal experience when we were kids was you'd be tapping away and you'd look up and the screen had frozen, and you'd crawl under the desk and unplug it and plug it in again and hope you hadn't lost more than ten minutes of work.

So everything's amazing and nothing works. The stuff that has to happen for it to work doesn't exist, and you don't quite know how that's going to evolve. That's where mobile was in 2005.

Bernard Leong: When do you feel this cycle is?

Benedict Evans: The cycle is faster. But my point is, yes, the cycle is faster — are you completely confident that in 2005 you would have predicted how mobile was going to work? No. Even in 2010, a lot of really clever people in tech were convinced this was going to play out like Windows versus Apple: Apple was closed, Apple would get obliterated.

Bernard Leong: That didn't work out.

Benedict Evans: That was just not what happened, and we could explain why. I didn't think it was going to happen, and I was right. But I'd been wrong about plenty of other things. I thought Windows Phone might have a chance, and it didn't, and I can explain why too. The reason I'm saying this is that there's a stage in the evolution of a market where you can have strong, clear opinions about how it's going to work. And there's a stage where, if you're still arguing about that, you're an idiot — if you're still arguing about why anyone would use an iPhone, or that you should be able to install any software you want on it, that was an argument from ten years ago, pay attention. But there's a stage at the beginning where you might have opinions about some of these questions, and you're probably not even asking the right questions.

That, I think, is where we are with this stuff today — where the PC was in the eighties, where the internet was in the mid-nineties, say 1997, where mobile was in 2006 or 2010. You just don't know how this is going to work, and you don't know what the levers are going to be. The funny thing about mobile was that we thought it was working before the iPhone. It just wasn't growing very fast. It seemed like it was kind of working.

Bernard Leong: We thought Nokia was going to dominate it.

Benedict Evans: Nokia, Windows, BlackBerry. It wasn't clear that something was going to come along and blow the roof off. We all thought mobile was basically a PC accessory — what do you do on the mobile internet, what are the mobile use cases? What actually happened is that now you say "desktop internet," and mobile made the PC a smartphone accessory. The smartphone became the center of tech and the center of most people's usage. No one in 2000 was saying that. Or 2005.

Bernard Leong: Because they couldn't imagine that the mobile phone was actually the personal computer, and that it could grow to that large a market size.

Benedict Evans: Almost literally. There's a great quote from one of the co-founders of BlackBerry. He looked at the iPhone and said: if this thing works, we're competing with a Mac. Because it's still quite hard, if you're not in tech, to understand that the iPhone is a Mac — it's a PC in the general sense. It was competing with phones that had some compute added, and then suddenly, no, this is a whole PC, and it had four or eight gigs of storage, which was a ludicrously large amount, and it was running a PC operating system. That's why BlackBerry and Nokia were completely screwed. The analogy I used to give: it was like you were in the ocean liner business, and then you look at jet airliners and think, I don't have any relevant skills at all for making that thing.

Bernard Leong: Let me put it another way. From my observation, as someone living between East and West, this batch of US companies — your Anthropic and OpenAI — are behaving more like Chinese companies. Let me give you some features. They work 996, and these last two years all the companies are expanding into Saturdays. People are working like crazy. Some of the AI researchers I know are in China. Alibaba is not just eBay, it's also PayPal, and all the other services too.

Benedict Evans: Much more horizontal.

Bernard Leong: Horizontal and vertical.

Benedict Evans: You've got a much wider matrix, with many rows and columns, and lots of companies have lots of cells filled in.

Bernard Leong: This batch of AI companies don't just supply, say, Cursor — Anthropic supplies Cursor, but now it's also coming in to compete with Cursor.

Benedict Evans: I'm not sure about that. On the 996 thing, the contrast is: what was the Valley like in 2015 or 2020, when Google had won and had a phase of cash, and there were a lot of people there not —

Bernard Leong: Working for the perks?

Benedict Evans: A lot of people weren't working very hard, because the money was just there. I'm not sure that was anything other than the nature of that S-curve — when is everything amazing and exciting, and you have to build everything yesterday? There's also an interesting contrast worth pointing out. Put this very simplistically: incumbents always make it a feature. Google's response will be to make it a feature.

Bernard Leong: That's right. Search is a feature.

Benedict Evans: They'll add it to search, add it to Gmail. It empowers new capabilities and makes the existing stuff better. This is also what Apple is doing now. Whereas if you're not the incumbent —

Bernard Leong: Anthropic uses Cowork.

Benedict Evans: You don't have places to add it as a feature. So for Anthropic and OpenAI, the question is: what are the use cases for this? What do you do with it? How do you communicate to people what they would do with it? The position at OpenAI late last year was, everything everywhere yesterday. We'll try a social video app. We'll try an app store. We'll try another app store. We'll try a checkout, an e-commerce checkout flow. We'll try an ad business — all sorts of stuff, because we don't know what it is, so we'll try all of them and double down on the one that works. It turned out the one that works is software development. At least that's the first one that's working, and Anthropic got that working first. So OpenAI is pivoting to go into that one.

Bernard Leong: But Anthropic also indirectly starts to kill off Cursor usage. So you get this cannibalization. This is very different from the US internet, where you're always in your lane —

Benedict Evans: That's not true. That's title horseshit. People have spent the last twenty years saying Google is killing this startup, or Apple is killing that startup. If you're making a platform — this was a whole five years of people arguing about big tech companies killing startups. Go back and look at the way Microsoft and Windows evolved. There was a program called Sideways that you could buy in the late eighties. It was a hundred and fifty dollars in late-eighties dollars, so three hundred now, whatever the inflation is. What Sideways does is let you print out in landscape. That's it. It lets you print your spreadsheet in landscape.

Bernard Leong: I remember Sideways.

Benedict Evans: That's a hundred and fifty dollars, and today it's a checkbox in the print dialog box in Windows and macOS. It's the same thing if you bought a spellchecker — another, less absurd example. Spellchecking was a whole software category. You might buy PC magazines and they'd have a group test of ten different spellchecking programs, all two or three hundred dollars. The way it worked: you'd write your document, save it, then open the spellchecking program and point it at your document, and it would run the spellcheck. So there's a whole bunch of classes of stuff that naturally gets integrated.

The question is how far that integration goes for an LLM. How far up the stack do you go? Is it agentic coding? Is that inherent to the thing? You could take the maximalist view and say the way these models will be able to do, quote unquote, everything is because they'll be able to write their own code to do stuff.

Bernard Leong: Yes.

Benedict Evans: You won't have the model doing everything itself from first principles in tokens. Instead it'll make a Perl script to do it. The model will write a piece of conventional, traditional software to do that thing, so everything will be done through models generating their own intermediate layers of code.

Bernard Leong: So I come back to your favorite analogy. In the 1980s, people were given a spreadsheet but didn't know what to do with it. Essentially the spreadsheet became Tableau and all the different kinds of software people built on top, because they discovered they knew how to use a spreadsheet. To me, Cowork is the equivalent of a spreadsheet today, except that it could create a Word document, create a coding script, do exactly what you were saying. Is that maybe the change in UI? Or is it that you still have to orchestrate it by telling it what you want, which makes it difficult to visualize what's going on?

Benedict Evans: Cowork, in a sense, is what I'm describing — you want the model to do a thing for you, and the way it does that thing might be using git, or writing an Excel macro.

Bernard Leong: It's not doing all of that for me.

Benedict Evans: It's not doing everything in the LLM as the LLM. The LLM is using a tool. However, I don't think that solves the underlying problem of a chatbot as UX. There are two fundamental problems here that drive software creation, company creation: the blank-screen problem and the jagged-frontier problem.

The blank-screen problem is, how do you know what you need? You don't sit down and know how. How do you know what buttons there should be, what the questions are, what the flows are? Most people couldn't sit down and draw a flowchart of precisely how they're going to do a job. When you buy software, what you're buying is a whole bunch of institutional knowledge and a whole bunch of thought about what the problem is, what the workflow should be, how it should work, what options you should be given at each step — none of which you're working out from first principles, sitting there with your eyes shut thinking, okay, how am I going to explain what I want? When you buy software, that's what you're getting. They worked that out for you. Most people are not tool creators in that sense. The person who's a really good graphic designer is not also the person to design InDesign.

The second is the jagged frontier, which I think is a better phrasing than hallucination. These models do some things very well and other things not very well.

Bernard Leong: Right.

Benedict Evans: Each new model maybe has the jagged bits in different places. Some people say hallucinations are solved, and you think, well, for you they are, but not for my use cases.

Bernard Leong: Actually, no —

Benedict Evans: Let me finish the point. Clearly we do not have AGI. There's stuff these can do and stuff they can't, and that comes in unpredictable ways. We don't necessarily know why it does this and not that. You can't tell from the command line which is going to work and which isn't. As a normal user, if you're not reading AI papers every day, you're not going to know — of course that task will work brilliantly, and this task you've got to ask in a completely different way, and it may still not be right. So you've got this jagged frontier of what the models are good at. Then there's a jagged frontier of what we can intuitively understand it's going to be good at or not. Then there's a jagged frontier of what actual use cases I have that are important and valuable. With software development, those all line up. But there are a bunch of others where they don't line up. Never mind what the tokens cost, which is another thing — this use case might use way more tokens than that one, and you might not realize why, or we might not even know why. The one that uses loads of tokens might not be worth very much either.

So there's all this jaggedness in how the bits fit together. That's why you hire a software developer — you hire a software company that sat down and worked out, this is how it's going to work. It will work if you do it like this, if you give it this data, if you frame it like that, if you pre-load it with this context, and we'll make sure to tell you if you ask it something that won't work, or that you probably want to check that number. You wrap it in tooling, use case, go-to-market and deep thought about what the actual problem is. The hard part of building a software company is not writing the code.

Bernard Leong: It's the distribution, and it's everything.

Benedict Evans: It's realizing that the problem exists, because very often the people who have the problem don't see it. The people the software is solving the problem for do not realize that's the problem. And the way you're solving it — maybe three or four other people have tried to solve it before and failed, and you've worked out the right way by turning it into a different problem that didn't look like that problem. You worked out the right go-to-market and the right way to get the data to make the whole thing work. If you're doing accounts payable, or color grading in a video editing shop, those aren't your skills. You're great at color grading. You're not great at working out the right insertion point, or even seeing that the problem exists.

That, to me, is the challenge to the whole thesis that the model will just go all the way up to the top. If you believe the models will scale to be super-general — whatever terminology we're using now — if you believe they'll understand everything I've just said, know all of it and do it all unprompted, then fine. But then we've got bigger problems than wondering about the future of enterprise software.

Bernard Leong: The bigger problem, speaking as a practitioner doing this every day, is exactly what you said about the jagged piece. I like that analogy — I couldn't articulate it. The way I'd put it differently: AI software is very good at deceiving you with an output you think you want. The demo is great, but once you get into a production situation, you start to see where the flaws come up.

Benedict Evans: It's very good — there are all sorts of ways to describe this — it's very good at giving you what a good answer would probably look like.

Bernard Leong: That's right. But it's not the best answer.

Benedict Evans: That may or may not be what you want. It depends on the use case, on who you are, on why you asked. A trivial example: a couple of years ago, when this had just started working, I went to speak at an event and they asked me for a long biography, and I didn't have one, so they made one in ChatGPT, and it was full of mistakes. But the thing that occurred to me is, if I'd used ChatGPT to do that, I could have fixed it in thirty seconds, because I'm an expert on myself. I can read the biography and say, no, not that company, not this company, change this, delete that paragraph, great. If you didn't know anything about me, it wouldn't be very useful for you.

So, what is a good answer? There are some questions that don't need a precisely correct answer, or where there is no precisely correct answer. Here's a picture of a new CPG product — suggest ten advertising slogans for it. That doesn't have a right answer. It has better and worse answers, but there's no precisely correct answer you require. So the question is, how do you turn things into generative AI questions? If you think about the last ten or fifteen years of machine learning, there was a bunch of stuff that was obviously a machine learning question. There were also companies that said, we realized we could turn this into pattern recognition. Or, we realized we could turn this into image recognition — it didn't look like image recognition, but we worked out a way to turn it into image recognition, and then you can automate it. So there's a whole question of the stuff that's obviously an LLM use case, versus the stuff that right now looks like it's really not an LLM use case, like law. Or maybe that's the wrong way of putting it — it looks like it should be, except that it's really hard, because you need to get it right. Then there's the question of how you manage all those problems.

Bernard Leong: Can I ask you this. When people start talking about job displacement, I always argue in the following manner. The problem is that the part of a lawyer's job that's manual, repeatable tasks gets taken away by the AI. But the creative aspect of being a lawyer — putting in the correct clause, knowing how to defend the client as a solicitor — that part stays in the system, because the AI is not going to give you the best answer. It gives you an answer that looks deceptively like the real answer, but is not the correct answer. Am I right?

Benedict Evans: Two things here. Firstly, it's very hard to predict this. You see this if you try to backtest it. You could say half the purpose of computing in the twentieth century was to automate accounting. We had adding machines, punch cards, mainframes, databases, data processing, spreadsheets, ERPs. It's all trying to automate accounting and bookkeeping. Go and look at the US census data: the number of people called an accountant or a bookkeeper basically goes up in a straight line all through the twentieth century. This is where people pull up the Wikipedia entry for Jevons paradox, which is basically price elasticity — if you make it cheaper to do something, do you do the same work for less money, or more work for the same money? Or maybe you do more work for more money, because you've got a new ROI. But if you stop and think for a minute, that's not quite what happened, because an accountant today doesn't do the work they would have been doing fifty years ago, only more of it. They're doing a whole bunch of other things they wouldn't have done then, because it would have been impossible.

Bernard Leong: It makes them able to do more work.

Benedict Evans: It also enables different work, a different character of work. If you were an investment banker forty years ago and you wanted to do a DCF, that would take all week. Now you can type it in and you've got a new DCF. So the stuff accountants are doing now is not what they were doing fifty years ago, and the fact that you could automate what they were doing fifty years ago unlocked all sorts of other stuff. The inverse of this is what happened most obviously to newspapers, where the internet didn't really change what it was to be a journalist, but journalists were being paid through a light manufacturing and trucking business that had a monopoly on local advertising.

Bernard Leong: That was going away.

Benedict Evans: If you'd looked at the job of a copy editor and asked, does the internet change the job of a copy editor, the answer is no. Zero exposure to the internet — except that the salary was being paid by something completely different. That wouldn't show up in any analysis of whether the job is exposed. Same thing with Uber. If you'd asked what jobs were exposed to smartphones, everyone was talking about GPS. No one was talking about taxis.

So in a sense what I'm saying is: why don't we just only buy the stocks that go up? You can't predict this stuff. We don't have perfect knowledge of the future. There's a second answer, which is even simpler — the lump-of-labour fallacy. When you automate work away, you can always see the jobs that are going to go away, because they're right there, and you don't know what the new jobs are going to be. But the fact that you've automated that work away has unlocked new economic demand, new economics that allow the consumption of new things, and human needs are infinite. How many people are earning a living making podcasts now? Imagine predicting that ten years ago. You don't know what the new jobs will be, but there will be new jobs. So yes, it will be painful, and there will be jobs that go away, but that's what's happened continuously over the last two hundred years. Two hundred years ago, ninety percent of us were peasants, worried the crops would fail. We spent the last two hundred and fifty years automating that away, and yet there are just as many jobs. So you need a theory for why this is different, and at that point there's a lot of hand-waving, because you can say this is happening way quicker — but is it? We're three years in and nothing's happened yet.

Bernard Leong: But you see less hiring —

Benedict Evans: At best you'll get a lot of argument amongst economists about what's really going on. I don't think there's any sense amongst economists that we're clearly seeing a decline in hiring yet. The broader job question — I want to come back to where you started, which is, what is the actual job? What are you actually hiring them for? Why did you hire the law firm? Did you hire them to make the contract? Sometimes the answer is yes. Sometimes the answer is, I hire them to make a plain-vanilla contract that does what all the contracts say. And somehow, when you actually hire a lawyer, there's always a reason why it's not that.

The analogy I was thinking about is Amazon and retailers, because Amazon can get you the thing. Set aside that there's a bunch of stuff not on Amazon, like luxury goods. But for the sake of argument — books would be the best example. If you know exactly what book you want, Amazon can get you that book. But how do you know you want that book?

Bernard Leong: You have to search for it.

Benedict Evans: When you go to a bookshop, you generally walk out with books you didn't know existed.

Bernard Leong: I still go to bookshops.

Benedict Evans: So the question is, is the purpose of the retailer to be the most efficient endpoint to a logistics chain — in which case Amazon is probably more efficient, though not necessarily for groceries; it's probably more efficient to go to the shop on the ground floor of your apartment building and buy milk than to order it from Amazon. But if you don't know that the thing exists, the shop is doing something else. Book buying is a leisure activity. It's about service and experience and suggestion and curation and recommendation. Amazon has eight or nine hundred million SKUs. You can't go to Amazon and say, I feel like buying something nice for my house. That's not a SQL query.

Bernard Leong: So the bookstore's job has changed so much.

Benedict Evans: My point is the function has separated. Is it just to get you the thing in the most efficient way possible, or is it something else? Amazon can get you the thing, but it can't really do the something else. The LLM can make you the thing, but can it do the something else? Another way to think about it: what LLMs do, absolutely inherently, is give you the average. They tell you, this is what most people would probably say. Is that what you want?

Bernard Leong: Maybe, if you have —

Benedict Evans: It depends. You go to your lawyer: I want an NDA for my company. Do you want a different NDA from what everyone else has, or do you just want the same NDA as everyone else?

Bernard Leong: It's probably the same NDA.

Benedict Evans: Maybe you want a lawyer to check it, but you probably won't be paying the lawyer to write a document that's exactly the same as the one everyone else uses. You probably shouldn't be, even now — and certainly with AI you won't be. But the question is, is that what you were getting from them? Were you getting something else? My point is, do you want the average? Do you want the mean of what everybody would probably do, what everybody would probably say? The answer very often is yes. How do I make a risotto? How long should I cook the rice for? I'm not looking for an original answer. But then you can extend that — here's a picture of my fridge, what should I cook for supper? That will work now. Try it. It will work, probably better than you'd expect.

Bernard Leong: I believe you.

Benedict Evans: It won't give you some radical, weird new thing you'd never thought of. It'll say, I can see some spinach, I can see some ricotta. But do you want the average? That's the point. Why do you go to the lawyer? Why do you go to Bain or BCG or McKinsey? Is it that you just want the deck? There's a dumb and stupid version of this, because LinkedIn is full of people who'll say, look, Claude made me this deck. The narrow criticism is that the deck is terrible — if McKinsey gave you that, you'd throw them out.

Bernard Leong: Yes.

Benedict Evans: But you have to distinguish between the things that will get fixed and improve, and the things that are a different question. That's not why you hired them. Sometimes it is — maybe if you're doing a plain-vanilla private equity due diligence and you just want a deck that says this is the market structure and these are the competitors, then yes, Claude can do that for you, or you can do it vastly quicker, and you won't pay them as much. But maybe that's not what you're paying them for.

I was talking to a friend the other day who was at one of the big four accounting firms, and he said: the CIO at the client wants to migrate from one version of Oracle to another, and has paid one of the big four to do a study on whether they should do this. The answer was yes. The CEO says, this is a lot of money, is this really the best way to spend it? So the CEO hires another of the big four for a second opinion — on the basis that they won't get the conversion work, so they're free to give an honest opinion on whether they should actually do this. The answer is yes, maybe, if you double the normal depreciation life. And also you should realize: number one, the partner at the first firm is best friends with the CIO. Number two, the CEO really wants a big shiny project. Number three, the CEO is about to retire and doesn't have much political capital. So what the CEO actually wants is a document to tell the board what he and the board already know.

Bernard Leong: It's never about the problem.

Benedict Evans: That's an extreme case. But what is it that you're hiring them for? Sometimes you're just hiring them to get the deck. You're just hiring them to get the code, to get the thing. You just want what everybody would probably make. At the extreme — the people most upset about AI — it's like using AI to write romance fiction, because it follows some very straightforward, well-understood conventions.

Bernard Leong: It follows good writers' form, a style of writing.

Benedict Evans: The challenge is, do you want something that isn't the average? This is an interesting theoretical question. An LLM can make you more prog rock, or more punk, or more new wave, or more stuff that sounds like not-very-good Nirvana. But it won't know that everyone is really bored of disco and prog rock, and the economy is terrible, and we're worried about this and that, and so we want a completely different kind of sound, and punk will do that. Maybe that's not why people wanted punk, but it won't know people want something different like that, and what it would be. If it did, that would be very different — maybe a different amplitude. It's easy to see that you can use AI to make more of the stuff we've got now. It's a different problem for it to know that we would want something different, and what we would want. How will it know? By default, that would be something outside the training data.

Bernard Leong: Yes.

Benedict Evans: So it's a long way of circling around your question: how often is it that you hire something because you want the average of the training data, and how often is it that you want something out of the training data, something that's not the normal suggestion?

Bernard Leong: Correct. If you're a hedge fund manager, you're looking for the alpha, so you wouldn't be looking for what everybody agrees on. You'd be looking at what everybody disagrees on.

Benedict Evans: But you can't just go to ChatGPT and say, give me some really stupid investment ideas.

Bernard Leong: Of course not. What you're saying is that the LLM is solving a lot of tasks where most people could agree this is the right way to do it — except we don't know what the new paths are.

Benedict Evans: This comes back to automation, doesn't it? The point of automation, right back to the nineteenth century, is: take this one task and do it exactly the same way, over and over again. So now with AI you can take a bunch of tasks you previously needed people for and say, do the same task — maybe not in exactly the same way, because it's a probabilistic system — do the same thing over and over again.

Bernard Leong: And redesign it in a different way so that it's efficient.

Benedict Evans: It's like McDonald's making the hamburgers look artificially irregular. Do you want it done exactly the same way every single time? The answer is sometimes yes, but not necessarily. That's not necessarily why you're going to somebody.

Bernard Leong: But if I take your point, then creators will still have a job, whether we have AI tools or not, because there's always a human need to create new things, build new things, come up with new angles. I'll use the example of people saying Go is dead because AI plays it. But we've been seeing a lot of Go masters now playing moves that previously were taboo.

Benedict Evans: I think that's a bad analogy. It's like saying no one will go running anymore because we have cars.

Bernard Leong: Of course people still go running. But all I'm saying is, does that mean nothing really changes — it's only the reconfiguration of what we think?

Benedict Evans: I'd argue against everything I've just said. There's a view that says what we've actually been doing for the last two hundred years is automating higher and higher-level human functions. You start by automating human beings as beasts of burden — hauling things, carrying things. You automate legs first.

Bernard Leong: And then you automate arms, and fingers.

Benedict Evans: Now you go to the brain. So you'll reach the top, and there'll be nothing left. It's a neat framing. I'm not sure it's very convincing. How would you know that's what it is?

Bernard Leong: They're not doing very well yet.

Benedict Evans: How is it that the things it's automating are all the things people want, all the things people need?

Bernard Leong: That is the underlying question, I suppose.

Benedict Evans: It is, and it's a sort of unanswerable thing, because unlike every other platform shift, we don't understand the science of this. You didn't know how the internet was going to evolve, but at a deep level you did — you knew PCs cost two or three thousand dollars and there were only a hundred million of them in the world, and there weren't going to be five billion people with a PC next year, and the telcos weren't going to give everybody in the world fiber by the end of 1998. You knew some basic constraints of what could happen. Same with mobile. You didn't know what the iPhone 4 would look like, but you knew it wouldn't have a retinal implant. You knew the basic physical constraints of where this could go, roughly what it cost, roughly how many were getting sold every year.

Whereas with LLMs, because we don't have a good theoretical description of why they work so well — this is the scaling thing — we know empirically that they have scaled, but we don't have a good theoretical explanation of whether that continues, or how long. So we can't predict what we'll be able to do with this stuff. Same with cost. Maybe we have a paper tomorrow that says you can get the same results for five percent of the compute. It's maybe unlikely, but we don't have a way to state that definitively.

Bernard Leong: The theoretical physicist's way of answering why they work so well: it just happens that you put a billion tokens in, you get the best gradient descents of AI working. That's probably how I'd look at it. But I'll have one final question. Listening to you and going through your work over the last ten years, I think asking you for predictions is the wrong question. I have a different question this time. What are the indicators that would tell you AI has actually eaten the world, or is still eating the world — where we are on the S-curve? Do we have to see something very unpredictable that hasn't shown up yet, or are these things always incrementally changing?

Benedict Evans: You could get a bunch of people agreeing and disagreeing on whether we've had a step change in capabilities, or continuous improvement of the same thing in principle. Have we had some radical acceleration, or is it just that we got coding to work? It's certainly the case that people who only test this every couple of months, and haven't looked at it since the beginning of the year, do not really understand what we have now. But — I don't know. It's tough to make these statements of magnitude. It's like asking whether mobile was a bigger deal than the internet. I don't really know how one would answer that.

Bernard Leong: That's a meaningless question.

Benedict Evans: I don't really know how one would answer it. I'm trying to think how to put this. Imagine you're an accountant seeing the first spreadsheets in the late seventies. It's life-changing. It does a week of work in an hour. That's what it's like to be a software developer looking at Claude Code now — it's mind-blowing. It completely changes what it is to do the job. However, now imagine you're a lawyer looking at a spreadsheet in 1978. Okay, this is very clever, my accountant should see this, but that's not what I do. Word processors exist at the same time.

Bernard Leong: Yes.

Benedict Evans: If you were a lawyer looking at a spreadsheet, you'd think, this is very clever, I might use it next week to do my timesheet. Set aside that you needed fifteen grand worth of computer to run a spreadsheet in 1978 — literally, an Apple II with enough memory and the disk drives was ten or fifteen thousand dollars. But set that aside. You're a lawyer looking at this thinking, that's great, I could use it to do my timesheet next week, but I'm not going to use it every day. That's not what I do all day. It's the same now with LLMs. Some people are the accountant looking at the spreadsheet. Most people are weekly active users and monthly active users, not daily active users. Even with thirteen-to-nineteen-year-olds, more people are weekly or monthly active users than daily active users. So most people are still the lawyer looking at the spreadsheet, saying:

this is very clever, it's quite useful, maybe next week. And I'm not sure the solution to that is a better model. I think the solution is that now you have to wrap it in different products, and think about how you create a use case where it's set up so that — I've realized you have this thing you didn't notice you were doing every day, and I've made a thing that solves it for you.

Bernard Leong: Because the interface isn't a do-everything interface.

Benedict Evans: It isn't, and making it work is still fiddly. Take an example. I travel a lot, so I need to put my expenses together. What's the right way to do that? I can look at the flight, the taxi, the hotel and type them into a spreadsheet. Option one: I put all the receipts together myself. Option two: Gmail could look at all those receipts as they come into email and automatically drop them into a Google Sheet. Am I going to set that up? Shouldn't that be Google's job? Option three: I could use a fintech or a corporate card that does it automatically. I sign up to a new card, and that's one of their features — they've built it, using machine learning, just as Google would. The one thing I'm not going to do is load up Claude Code and say, hey, can you log into my bank and my Gmail, and here's my Uber account, can you calculate which of these are expenses. That would be insane. That would be a really, really stupid way of doing it. You could. But why on earth would you?

Bernard Leong: Good point. That's a good place to wrap. Many thanks for coming on the show. Very simple question: where do my audience find you? I'm a subscriber to your newsletter, so I'm just going to recommend it.

Benedict Evans: I need to check whether you're a paying subscriber.

Bernard Leong: I can send you the order receipts.

Benedict Evans: Over the last ten years. My parents had good SEO, so if you Google "Benedict Evans," there's my website. I publish a presentation twice a year about what I'm thinking about, and then there's a weekly newsletter, which is my notes for the week on what's going on, what was interesting, and what it meant.

Bernard Leong: Thank you for the spreadsheet analogy. I think about it every time I think about product.

Benedict Evans: Thank you.

Podcast Information: Bernard Leong (@bernardleong, Linkedin) hosts and produces the show. Proper credits for the intro and end music: "Energetic Sports Drive" and the episode is mixed & edited in both video and audio format by G. Thomas Craig (@gthomascraig, LinkedIn).

Analyse Newsletter Analyse Podcast Analyse Video Analyse Asia Transcript Analyse Asia Benedict Evans Artificial Intelligence Agentic AI AI AI Eats the World SuperAI

Here is the edited transcript of our conversation:

Comments

You might also like

Inside "Defending Taiwan": How to prevent a war between China and the US with Eyck Freymann

Innovationism: A New Philosophy for the Age of AI with James Liang