AI FinOps is the discipline of managing AI spend the way mature organisations manage cloud spend: budgets per team and per use case, attribution back to the workflow that incurred the cost, alerts when usage trends deviate, and accountable owners. Cloud FinOps took the industry a decade. AI FinOps does not have a decade because token spend is more variable, less predictable, and tied to user behaviour.

Why is token spend so much harder to predict than cloud spend?

Three reasons. A single coding agent on a long task can spend more tokens in an afternoon than a customer-support workflow uses in a month. A poorly tuned prompt can ten-x cost overnight. And consumption scales with user behaviour rather than infrastructure. The variance between similar tasks is wild compared to the variance between similar EC2 instances.

What does a token budget framework include?

Per-workflow budgets with hard caps, attribution to the team that owns each agent or feature, observability into prompt cost and response cost separately, prompt optimisation reviews on the top-spend flows, model tiering (cheap models for simple work, premium for hard), and incident response for spend anomalies. Mature setups also include forecast envelopes so finance can plan, not react.

When does my CFO need to care about token spend?

Q3 2026 at the latest if you are running AI in production for any non-trivial workload. Earlier if you are deploying coding agents or letting users invoke long-running flows. Earliest of all if you have variable user-driven AI features where one customer's behaviour can move the bill. Uber's April overrun was not an anomaly. It was a leading indicator.

What are the three failure modes that turn token spend into a board issue?

First: silent compounding usage where adoption grows linearly but spend grows non-linearly because prompts are inefficient. Second: a runaway agent left in a long-running loop, often unnoticed for days. Third: variable user behaviour where a small number of power users disproportionately drive cost. All three are invisible until the monthly bill arrives. All three are detectable with the right instrumentation.

Uber burned through its token budget by April. Your business will be next

In an April interview with The Verge, Uber's CEO Dara Khosrowshahi admitted, almost as an aside, that the company's CTO had told him they had burned through their token budget by early April. The interview was about hiring, software-team structure, and how AI was reshaping the relationship between product managers, designers, and engineers. The token budget line was a throwaway. It shouldn't have been. It was the most important sentence in the conversation.

Uber is not a company that runs out of money in April by accident. They have one of the most sophisticated cloud finance functions in tech. If their AI spend caught them by surprise, yours will too.

Token spend is becoming a P&L conversation, and most organisations are still treating it like an experiments line.

The cloud FinOps curve, replayed at speed

I have watched this exact pattern before. In 2014, every CFO I worked with was discovering that "cloud is cheaper than on-prem" was only true for a few specific workloads. The bill kept growing. Engineering teams spun up environments and forgot about them. Data egress charges arrived without warning. Reserved instances were bought and never used. The discipline we now call FinOps emerged because the pain became too obvious to ignore.

That curve took the industry roughly a decade to flatten. Cloud Foundation, AWS Cost Explorer, GCP billing alerts, Spot.io, CloudHealth, the FinOps Foundation, dedicated FinOps engineers in every serious cloud-native business. Ten years of work to turn cloud spend from "scary line item" to "managed discipline."

Token spend will not get a decade. It's already moving faster, for three reasons.

Spend is far less predictable. A single coding agent on a long-running task can consume more tokens in an afternoon than a customer-support workflow uses in a month. A poorly tuned prompt can ten-x cost overnight. The variance between similar tasks is wild compared to the variance between similar EC2 instances.

Model choice changes economics by orders of magnitude. Routing a task to the wrong model can be 30x more expensive than routing it correctly. Most teams have no routing strategy. They picked one model in a proof-of-concept and never revisited the decision.

Caching, batching, and prompt design are not yet boardroom vocabulary. A team that knows how to use prompt caching, batch APIs, and disciplined prompt design will spend a fraction of what an undisciplined team spends for the same outcome. This is a skill that exists in maybe 5% of engineering organisations today.

Anthropic's own positioning of Claude Code for Enterprise tells you where the market is going. The product page emphasises contribution metrics, token usage, cost monitoring through OpenTelemetry, centrally managed permissions, file-access restrictions, and per-team observability. Those are not features for hobbyists. They are features for organisations that are about to learn what their AI spend really is.

Workflows are cheaper than agents

There is a specific insight from Zapier's Wade Foster that more leaders should hear. He has been saying for months that agents are less reliable and more expensive than deterministic workflows, so leaders should use workflows where reliability and cost matter, and reserve agents for ambiguity, recovery, and creation.

This is the most important practical FinOps lever your engineering organisation has, and almost nobody is using it.

Most teams default to "build an agent" when the actual problem is "automate a known sequence of steps." A deterministic workflow runs in milliseconds at near-zero variable cost. An agent runs in seconds, sometimes minutes, at meaningful per-task cost, with non-zero failure rates and the occasional spectacular blowup. The cost difference between these two approaches, applied across an enterprise, is the difference between AI being a margin builder and AI being the new shadow line item that ate the budget.

I have built both. On one recent engagement, the team had built an agent to triage incoming documents, classify them, and route them. Six months in, it was costing roughly fifty times what the same triage would have cost as a deterministic pipeline using a small classifier model and a few rules. The agent was more flexible, yes. The flexibility was being used about 4% of the time. The other 96% was a known sequence of steps being run through the most expensive possible machine.

The right architecture is almost always workflows where the path is known, agents where it isn't. Most teams are doing the opposite because agents are newer and more interesting to build.

What a CFO actually needs by Q3

If you are the CFO, COO, or finance lead of an organisation that runs more than a handful of AI use cases in production, there is a specific operating discipline you need before the end of this quarter. I'd put four things in place.

Per-team token budgets, with hard caps. Same model as cloud spend. Each team gets a monthly allowance, denominated in dollars or tokens, with alerts at 50%, 80%, and 100%. The cap is enforced at the API layer, not in a wiki. If a team needs more, they ask. This is the single highest-impact move you can make this year.

Unit economics per workflow. For every AI-driven workflow in production, you should know its cost per execution. Cost per resolved support ticket. Cost per generated report. Cost per closed code review. Without this number, you cannot make any honest decision about whether the AI workflow is profitable. With it, you can compare variants, justify investment, and spot regressions.

A model routing policy. Most organisations are still defaulting to the most capable model for every task. That is genuinely expensive. A simple extraction task can cost 30x more on a frontier model than on a small one fine-tuned for the job. A routing layer that sends simple tasks to smaller models, harder tasks to larger ones, and routine extractions to cached responses can cut spend by 50% to 80% with no perceptible quality loss. The engineering effort is small. The financial impact is enormous.

Observability that includes cost, not just performance. OpenTelemetry, Honeycomb, Datadog and most major observability platforms now expose AI cost telemetry. If your engineering teams can see latency and error rate but not cost-per-call, they cannot optimise the third dimension. Show them the bill.

The board reframe

For boards and investors reading this, the simplest question to ask the next executive who presents an AI strategy is: "What is your monthly token spend, and what controls do you have on it?" If the answer is "we're tracking it," the answer is no.

Token spend will follow the cloud spend curve, except the slope is steeper, the volatility is higher, and the lessons that took a decade to learn the first time are available for free in the second. There is no reason to repeat the mistake at a faster cadence. The companies that build AI FinOps discipline in 2026 will look like the companies that built cloud FinOps in 2018. The ones that don't will look like the ones still arguing over reserved instance utilisation in 2024.

If you'd like to talk through what an AI FinOps function would look like in your business, get in touch. I've sat on both sides of this conversation. Engineering teams that built it well and finance teams that wished engineering had built it sooner.

Uber burned through its token budget by April. Your business will be next

The cloud FinOps curve, replayed at speed

Workflows are cheaper than agents

What a CFO actually needs by Q3

The board reframe

Frequently asked questions

Ready to make AI actually work?

Uber burned through its token budget by April. Your business will be next

The cloud FinOps curve, replayed at speed

Workflows are cheaper than agents

What a CFO actually needs by Q3

The board reframe

Frequently asked questions

AI & tech are moving fast.Get the signal, not the noise

Ready to make AI actually work?

AI & tech are moving fast.
Get the signal, not the noise