Apple is now paying Google a billion dollars a year. I think they’re still winning on AI.

diy iphone repair with tools and laptop

A few weeks ago I shipped an optimisation to the on-device transcription model in the app I’ve been building. The same hour of audio that used to take four minutes to process now takes thirty seconds on my iPhone 16 Pro. Eight times faster, on the same hardware, in a single update.

That’s the moment that made me look up and notice the gap between two stories. The first is the one everyone’s telling: Apple is behind on AI. Apple just paid Google around a billion dollars a year to use Gemini for the next generation of Siri and Apple Intelligence. The second is the one I’m seeing in the work, where the on-device performance ceiling has been quietly compounding for almost a decade. Both are true. They describe different races, and only the first is being covered.

The story everyone’s telling

The conventional wisdom on Apple’s AI position is grounded in real, defensible facts. Apple Intelligence missed its 2025 deadlines. The personalised Siri features were quietly punted to 2026. Mark Gurman at Bloomberg broke the Apple-Google deal in November 2025: Apple paying around $1 billion a year for a custom 1.2-trillion-parameter Gemini model to power Siri and the next generation of Apple Foundation Models. Eight times larger than Apple’s existing 150-billion-parameter cloud models. Apple’s own statement said Google’s technology provides the most capable foundation for Apple Foundation Models.

Ben Thompson at Stratechery spent most of 2025 framing this as “Apple Retreats”, and on its own terms he wasn’t wrong. Apple’s cloud LLM ambitions failed. They tried to build the world’s best foundation model in-house and decided, after spending real money, that they couldn’t. The Gemini deal is the bill for that admission.

There’s no reframing of those facts that makes them disappear. Any post that tries to deny Apple’s cloud AI struggle is dishonest, and I’d rather lose the argument than make the dishonest version of it. Apple is behind on AI in the way the press is using the phrase.

The phrase is doing a lot of work, though, to hide a more interesting question, which is: behind at what, exactly?

The story I’m seeing build something

Apple shipped the first commercial on-device neural engine in a smartphone with the A11 Bionic in iPhone 8 and iPhone X on 12 September 2017. Eight and a half years ago. Before “on-device AI” was a phrase anyone outside Apple’s hardware team was using. The Neural Engine has been in every iPhone Apple has shipped since, getting roughly an order of magnitude faster across that span.

That’s the install base. Tens of millions of devices going back to 2017, every one of them with a dedicated neural processor, every one of them running the same Core ML stack a developer can target with the same code. As an app developer, that’s an extraordinary thing to be able to assume.

At WWDC 2025 Apple opened up the Foundation Models framework, which gives developers free, offline, Swift-native access to a 3-billion-parameter on-device language model with a Swift macro-driven API for guided generation. Free meaning no per-token cost. Offline meaning it works on a plane. Free of the privacy questions you have to navigate when you ship user data to a hosted LLM. Available on every iPhone of recent vintage.

When you read the cross-platform mobile ML literature, the consensus is striking. ONNX Runtime, which is Microsoft’s framework for running models across operating systems, explicitly recommends Apple devices with the Neural Engine “to achieve optimal performance.” That’s the cross-platform standard endorsing Apple’s stack as the fastest place to run a given model.

The thirty-seconds-per-hour transcription number from my own testing isn’t unusual. It’s the kind of performance headroom that quietly changes what a developer is willing to attempt. Features that need real-time inference on user audio. Summaries that run in the background while you’re listening. Topic extraction across an entire podcast feed without sending a single second of audio to a server. The kind of features you’d never ship if you had to assume an unpredictable round trip to a hosted model. Testing on phones of the same generation (Pixel 8 against iPhone 16 Pro), the gap on equivalent on-device workloads isn’t subtle.

Bifurcation, not retreat

The Gemini deal isn’t a contradiction of any of that. It is, I think, the strategic choice that comes from understanding it.

Apple has decided that the cloud LLM market is one where the best you can do is be a customer of whoever’s winning. There are three credible providers, the per-token prices are still falling, and the moat for any one of them is shrinking quarter on quarter. Spending another two years and another set of billions chasing parity with Google or OpenAI on the cloud layer is a bet against the trend line. So they outsourced. The Gemini deal is Apple paying for a clean exit from a race they decided they couldn’t win at the price the next two years would have demanded.

That frees up the rest of their AI investment to compound on the layer where they actually lead. Owning the hardware, the silicon, the operating system, and now the developer framework on top of it is a stack nobody else can replicate. Google has Gemini, but Gemini Nano on Android sits behind a fragmented NPU layer that Google itself, in the LiteRT framework documentation, describes as a maze of vendor-specific compilers and runtimes that developers have to navigate across hundreds of SoC variants.

The compounding effect is the part that makes this strategic rather than defensive. More developers shipping on-device AI features means more apps that work better on iPhones than on Android. More iPhones in the install base means more devices ready for the next on-device feature Apple ships. More on-device features mean more reasons to keep buying Apple hardware over the cheaper alternative. None of that requires Apple to win the cloud LLM race. The Gemini deal buys time on the cloud layer while the on-device layer matures into the durable advantage.

Where I might be wrong

The honest version of this post acknowledges where the contrarian thesis could break.

The Tensor G5 in the Pixel 10 closed a real chunk of the on-device gap on flagship Android, with Google claiming up to a 60% NPU performance lift and twenty new on-device AI experiences at launch. That’s the install-base story Google needs, even if it doesn’t yet propagate down to the Pixel 7 and 8 generations a lot of people actually carry.

Core ML quietly falls back to CPU or GPU when it encounters operations the Neural Engine can’t run, and the practitioner reference for ANE work documents that there’s no public API to write custom ANE kernels. “The ANE is fast” is conditional on your model fitting its supported op set. The story isn’t a clean win on every workload. It’s a clean win on the workloads Apple’s compiler has been optimised against, which is most of them, but not all.

And the developer-platform bet only matters if developers actually adopt Foundation Models in numbers. The early signal looks good, but isn’t yet conclusive. What would change my mind: a 2026 retrospective showing low Foundation Models adoption across the top 100 iOS apps, or Google shipping a credible cross-Android NPU abstraction that closes the install-base gap before Apple’s compounding gets traction.

I’m not betting that Apple wins the AI decade. I’m betting that the part of the AI story everyone’s writing about isn’t the part that compounds.

What I think happens next

The companies that win consumer AI over the next three years won’t necessarily be the ones with the largest models. They’ll be the ones whose models run instantly and offline on hardware people already have, in apps they already use, every day. That’s a different race than the one the press is covering. It’s the race Apple has been quietly running since 2017, and it’s the race they’re now paying Google a billion dollars a year to stay in while they finish winning.

I’d take that bet.

Sign Up by Email

Get the latest posts delivered to your mailbox: