The Cost of AI: Tokens

May 06, 2026

Several patterns emerged from the past six months of enterprise AI spending. First, token prices dropped sharply. Second, companies spent more money anyway. Third, executives started celebrating engineers who burned through the most tokens. The gap between those three facts reveals something important about how AI costs actually work.

The numbers are straightforward. Per-token inference costs fell roughly 75% year-over-year according to enterprise spending data from Ramp. Epoch AI research suggests the decline approaches 200x annually when accounting for both pricing and efficiency gains. Competition among model providers, open-weight alternatives, and hardware improvements all pushed prices down. The collapse is real and significant.

But total AI spending moved in the opposite direction. Organizations spent an average of $1.2 million on AI-native applications in 2025, more than double the prior year, according to Zylo's 2026 SaaS Management Index. Nearly 80% of IT leaders reported unexpected charges tied to consumption-based AI pricing. The bill went up even as the unit cost went down.

The disconnect stems from how consumption patterns changed. Databricks CEO Ali Ghodsi singled out an engineer who spent over $7,000 in tokens during a two-week period in January. The company held a meeting where everyone applauded. Meta CTO Andrew Bosworth called token spending "easy money" with "no limit." The term "tokenmaxxing" emerged to describe maximizing token usage as a productivity metric.

Token pricing varies widely. Basic tasks on cheaper models can cost a few cents per million tokens. Complex computations on premium models run from $20 to over $100 per million tokens. Anthropic charges $25 per million output tokens for Claude Opus 4.6. Those are list prices. Actual costs depend on utilization rates, which rarely hit 100%. At 30% utilization, base inference costs on an H100 GPU jump from $0.0038 per million tokens to roughly $0.013. At 10% utilization, the cost reaches $0.038.

The pricing structure creates a paradox. Falling per-token costs make AI seem cheaper, which encourages higher consumption. That higher consumption often cancels out the savings and pushes total costs higher. Appfigures data showed that image model releases drove 6.5x more downloads than traditional model updates. ChatGPT added 12 million incremental installs in the 28 days after introducing its GPT-4o image model. More usage means more tokens processed, which means larger bills regardless of unit price.

Infrastructure constraints are starting to appear. Anthropic cut off millions of users from OpenClaw after it overwhelmed their systems. The company shifted to pay-as-you-go billing instead of open-ended usage limits. Capacity is finite, and providers are prioritizing customers who pay per token over those on flat subscriptions. Gartner analyst Will Sommer told The Verge that AI companies would need close to $2 trillion in annual revenue by the end of the decade to cover infrastructure costs. Current pricing models do not support that math.

The operational costs extend beyond token prices. Semantic caching, prompt compression, and utilization optimization can reduce token consumption by 40% to 60%, but those require engineering resources. Data preparation and cleaning add another layer of expense. RAG systems need structured data, which means dedicated engineering work before the first query runs. Then there are MLOps costs, monitoring infrastructure, and the labor required to manage prompt injection attacks and model degradation.

Stanford HAI's 2026 AI Index Report noted that US private AI investment reached $285.9 billion in 2025. AI data center power capacity hit 29.6 gigawatts, comparable to New York state at peak demand. Annual GPT-4o inference water use may exceed the drinking water needs of 12 million people. Those environmental and infrastructure pressures will eventually flow through to pricing.

The current moment resembles the early cloud computing era when per-instance pricing dropped while total cloud spending climbed. The difference is that AI consumption scales faster and less predictably than traditional compute. A viral feature or unexpected usage pattern can multiply costs overnight. Organizations are discovering that cheaper tokens do not mean cheaper AI, just more consumption at lower unit economics until the bill arrives.

Sources:

Richard Nieva, "The 'AI Gods' Spending As Much As They Can On AI Tokens," Forbes, March 31, 2026.
Victor Tangermann, "The Horrible Economics of AI Are Starting to Come Crashing Down," Futurism, April 24, 2026.
Thomas Claburn, "Tokenmaxxing isn't an AI strategy," The Register, April 26, 2026.
Victor Coimbra, "Is AI really getting cheaper? The token cost illusion," Artefact, April 1, 2026.
Hilary Sargent, "What Are AI Tokens — and Why Are They Costing Companies Millions?" Kelly Services, May 4, 2026.
Connie Loizos, "Are AI tokens the new signing bonus or just a cost of doing business?" TechCrunch, March 22, 2026.
Ghita Ghita, "The 5 Hidden Costs of Building an AI Startup in 2026," SemNexus, May 5, 2026.
Stanford HAI, "2026 Artificial Intelligence Index Report," 2026.
Zylo, "2026 SaaS Management Index," 2026.

Discussion about this post

Ready for more?