How to Forecast AI Product Costs When Every User Action Has a Cost

Define the action that drives spend, model behavior by segment, and connect cost per action to COGS, gross margin, pricing, and runway — before scale hides the risk.

By Anastasiia NikolaevaJuly 5, 2026

Why every user action can become a cost driver

Many AI products no longer look like classic SaaS: one seat, one login, mostly fixed infrastructure. Each meaningful interaction can trigger model inference, tool calls, retrieval, storage, or human review — and each of those steps has a variable cost that scales with behavior, not with headcount.

That shift is structural. Directional industry benchmarks for AI-native products often land around 50–60% gross margin (ICONIQ projects ~52% on average in 2026) versus 80–90% for traditional SaaS (a16z: 60–80%+ for comparable SaaS) — not because founders are careless, but because inference and orchestration sit in cost of revenue and grow with usage. Hybrid pricing (subscription plus usage or overage) is becoming common precisely because flat seats hide variance between light and heavy users.

The modeling job is to connect product behavior to financial outputs before you scale free usage or paid acquisition. That means choosing what counts as an action, estimating cost per action, and reading how those assumptions change margin and runway — not copying a provider rate card into one COGS cell.

If you are launching a generative feature and need the launch-specific pass — one request, plan caps, trial usage — start with how to forecast generative AI API costs before launch. This article goes wider: action-based economics across product types, cost metric design, segment modeling, and pricing alignment.

Choose the right unit of cost

Founders often start from the vendor bill: tokens, API calls, or GPU minutes. Those are useful inputs, but they are not always the unit founders should forecast in. The cost unit should match what users do and what you may eventually price.

Common units include:

Prompt or message — chat, copilot, Q&A where each turn triggers inference
Generation — image, audio, slide, or document output where resolution and length change cost
Workflow run or agent task — multi-step automations with tool calls, retries, and optional human escalation
Minute or conversation — voice, realtime, or session-based products
Completed outcome — ticket resolved, report delivered, case closed — when value is outcome-shaped even if cost is step-shaped

Choose the cost unit from behavior

Prompt / message

Fits: Chat, copilot, Q&A

Watch: Session depth varies widely

Generation

Fits: Image, audio, document output

Watch: Resolution and length change cost

Workflow run

Fits: Multi-step automations

Watch: Steps × models × retries

Agent task

Fits: Tool calls, research, actions

Watch: Success rate and escalation

Completed outcome

Fits: Ticket resolved, report delivered

Watch: Hard to meter before product proof

The unit should match what users actually do — not what your vendor invoice labels as tokens or requests.

One product may use more than one unit internally. A writing assistant might meter chat turns separately from long-document generations. An agent product might count tasks while also tracking failed runs and human review time. The model should reflect the dominant behaviors that move COGS — even if finance eventually reports one blended line.

Where those costs sit in the forecast matters for investor interpretation. See which startup costs count as product-serving costs for COGS versus overhead boundaries — generative AI and usage APIs typically belong close to gross margin, not buried in R&D.

Want to connect actions, COGS, and pricing in one forecast?

Model AI usage by plan, segment free and paid behavior, and read gross margin and runway together in Stavia Models.

Average usage vs heavy-user margin risk

Blended averages are seductive. "Users average 500 actions per month" can produce a comfortable gross margin on paper — while a minority of power users consume far more and erase profit on the same flat plan price.

AI-native economics compress margin when usage variance is high. Industry framing often compares flat per-seat pricing against hybrid or usage-aligned structures: the same subscription price can show strong margin on median behavior and negative unit economics on heavy agentic workflows. That is not a spreadsheet error — it is a product and pricing design signal.

Average usage can hide heavy-user risk

Blended average (500 actions / mo)

Plan price: $49 / mo
AI COGS / user: ~$12
Unit gross margin: ~75%

Heavy user (2,500 actions / mo)

Same plan price: $49 / mo
AI COGS / user: ~$58
Unit gross margin: Negative

Same subscription price, very different margin — flat pricing without usage guardrails exposes you to power users.

Model at least three behaviors: median paid user, upper-decile ("power") user, and free or trial user with pre-conversion usage. You do not need perfect distribution data early — but you do need to stress the plan caps and pricing promise against a heavy-user path before you scale acquisition.

Unit economics views make the gap visible at the subscriber level — ARPA, COGS per active subscriber, and unit gross margin — separate from company-level P&L gross margin. For how to read those metrics without over-interpreting blended numbers, see how to read startup unit economics.

Free usage, trial usage, and paid usage

Action-based cost hits different segments at different times. Free and trial users often generate COGS before revenue appears. Paid users inherit plan caps. Outliers on paid plans are where unlimited or generous caps turn into margin leaks.

Model segments separately

Free tier

Users1,200 active users

Per user8 actions / user

Total9,600 actions / month

$0 direct revenue

Persistent non-paying cost

Trial

Users180 users in trial

Per user40 actions / user

Total7,200 actions / month

$0 before conversion

Cost concentrated in trial window

Paid — median

Users320 subscribers

Per user120 actions / user

Total38,400 actions / month

Base paid-user economics

Within plan caps at median usage

Paid — outliers

Users12 subscribers

Per user900 actions / user

Total10,800 actions / month

7.5× median usage

Flat-price margin risk

In this illustration, non-paying users generate about 25% of all actions. Just 3.6% of paid users generate about 22% of paid actions.

The important pattern is not simply that free users cost money. In this illustration, free and trial users create roughly a quarter of all actions before direct revenue appears. Among paid customers, only 12 outliers represent 3.6% of subscribers but generate about 22% of paid actions. A single blended "users × average actions" assumption hides both effects — non-paying usage that hits COGS early, and a small paid cohort that can dominate total paid actions on a flat plan.

In the forecast, model each segment separately. Time-boxed trials concentrate non-paid COGS in a conversion window; freemium spreads it across a persistent free base. Access design changes the shape of AI COGS as much as model pricing does.

How AI cost changes gross margin, LTV, and runway

Action-based AI cost is a bridge variable. It connects product behavior to COGS, COGS to gross margin, gross margin to LTV and CAC tolerance, and total burn to runway — especially when free and trial tiers consume inference before MRR catches up.

Action-based cost in the model chain

Actions per user

Behavior × caps × product surface

Cost per action

Model, steps, retries, escalation

COGS & gross margin

Company P&L and unit economics

LTV & CAC tolerance

What acquisition spend can carry

Cash & runway

Free/trial bleed + paid scale

AI product cost is not an isolated COGS line — it changes whether pricing, acquisition, and runway assumptions still hold together.

Company P&L gross margin uses total revenue and total COGS — including free and trial serving cost. Unit gross margin on paid subscribers excludes non-paid serving in the per-subscriber read, which is why both views matter: one tells you whether the business model works at scale; the other tells you whether each paid customer you acquire can carry their own serving cost and acquisition spend.

When AI COGS rises faster than ARPA — because actions per user climb, caps are generous, or conversion lags — LTV compresses and acceptable CAC falls even if signup volume looks healthy. Runway shrinks from the combination of acquisition spend and usage-linked COGS on segments that are not yet revenue. Cash timing is covered in how to read startup cash flow and runway.

A practical action-based cost example

Consider a fictional B2B workflow product: users run AI-assisted document reviews. The founder defines the cost unit as one review run (parse, summarize, extract fields — typically one orchestrated workflow, not one chat message). The snapshot below uses $0.08 per run, a $79 Pro plan with 500 included runs, and month-six portfolio assumptions you can audit line by line.

Document review workflow — month 6 snapshot

Cost per review run

$0.08

Pro plan

$79 / mo

Included allowance

500 runs

User economics

Median paid user

Usage120 runs / month

AI COGS$9.60

Model-cost margin≈ 88%

Before other COGS

Power user

Usage600 runs / month

AI COGS$48

Model-cost margin≈ 39%

100 runs above included allowance

Free tier

Users800 active

Per user15 runs / month

Total runs12,000 / month

AI COGS$960

$0 direct revenue

Runway burden before conversion

Month 6 portfolio

Revenue

240 paid × $79 = $18,960 MRR

Paid usage

137.5 blended runs / user
× $0.08 = $11 AI COGS / user
240 × $11 = $2,640 paid AI COGS

Free-tier cost

800 × 15 × $0.08 = $960

Simplified model-cost margin

Paid users only≈ 86%

After free-tier AI cost≈ 81%

Before other COGS — AI model cost only

The free tier reduces simplified margin by about 5 percentage points before other COGS.

At the user level, the median paid customer looks healthy: 120 runs at $0.08 produces $9.60 AI COGS against a $79 plan — roughly 88% model-cost margin before other COGS. A power user at 600 runs produces $48 AI COGS and roughly 39% margin on the same flat price, with 100 runs above the included allowance. The free tier adds $960 monthly AI COGS with no direct revenue — a runway line even when median paid economics look fine.

At portfolio level, 240 paid subscribers at a blended 137.5 runs per user ($11 AI COGS each) produce $2,640 paid AI COGS against $18,960 MRR — roughly 86% paid-user model-cost margin. Carrying the free tier's $960 AI COGS reduces that simplified margin to about 81%, a gap of roughly five percentage points before other COGS. Power users above the 500-run cap still need overage pricing or throttling — the blended average can look acceptable while outliers erode margin.

Sensitivity matters: increasing free-tier usage from 15 to 40 runs per user raises monthly AI COGS from $960 to $2,560 — about 2.7× — without adding revenue. That is why both usage distribution and non-paying segments belong in the model, not one portfolio average.

How AI feature design should shape pricing architecture

Usage-based pricing is not automatically the right answer. The internal cost unit and the customer-facing pricing metric can be different. What matters is that the plan architecture does not hide a usage pattern that destroys margin. A useful pricing decision connects three things: what creates cost, what the customer values, and what the forecast shows about median and high-usage behavior.

The chain is: AI feature design → underlying cost behavior → customer value unit → pricing architecture → financial-model test. You do not have to meter exactly what the model bills internally — but you do need a plan shape that survives the usage distribution your feature encourages.

Feature design → pricing architecture

Chat / copilot

What creates cost: Tokens, context length, tool calls, high-frequency sessions
Pricing response: Seat + fair-use allowance; included usage + overage; tiers by model access
Model test: Median usage, p90/p95 usage, AI COGS per paid account, margin under included limits

Generation

What creates cost: Generations, model choice, quality/resolution, retries
Pricing response: Credits, included generations, output packs, overage
Model test: Cost per usable output, repeat-generation rate, free-user cost, median vs heavy-user margin

Multi-step workflow / agent

What creates cost: Model calls, retrieval, tools, workflow steps, retries
Pricing response: Base subscription + included runs; workflow bundles; overage; usage tiers
Model test: Average cost/run, p95 cost/run, success rate, retry rate, escalation, margin by workflow type

Outcome-oriented automation

What creates cost: Attempts, failed runs, retries, tool usage, escalation
Customer value unit: Completed outcome (customer value unit)
Pricing response: Per outcome; base + outcome fee; outcome bundles
Model test: Attempts per successful outcome, cost per attempt, success rate, cost per successful outcome, failed-run cost

The cost unit constrains pricing. The value unit determines what customers want to buy. The forecast tests whether the two can coexist at a healthy margin.

Outcome-based pricing illustrates the distinction. A completed outcome is usually a customer value unit or pricing unit — not the underlying cost driver. Cost still accumulates through model calls, tool calls, retries, failed attempts, workflow steps, and human escalation. Outcome pricing can align with customer value, but it works only if the founder models cost per successful outcome, not only price per successful outcome — including the cost of unsuccessful attempts.

Return to the document-review example above: at the median, 120 runs create $9.60 of AI COGS against a $79 plan, so the economics look comfortable. A 600-run power user creates $48 of AI COGS before other COGS. The pricing problem therefore does not come from the average user — it comes from the width of the usage distribution. A 500-run included allowance should be tested against actual heavy-user behavior: if power users stay rare, a fair-use cap may be enough; if they become material, test overage pricing, higher-usage tiers, or a different package architecture. There is no universal answer — only a forecast comparison.

Practical pricing decision triggers

Flat or seat-based pricing can remain viable when:

AI COGS is small relative to price; usage distribution is narrow; heavy users are rare; margin stays healthy at high-percentile usage.

Caps or overage become worth testing when:

Heavy users materially compress margin; included usage creates cross-subsidy; usage distribution is wide; higher usage reflects additional customer value.

Outcome-based pricing can be tested when:

Customer value is measurable at the outcome level; success is observable; cost per successful outcome is reasonably predictable; failed attempts do not destroy margin.

Pure usage pricing may be a poor fit when:

Customers cannot predict usage; usage does not map cleanly to value; metering creates adoption friction.

Connect plan design to SaaS pricing and revenue model assumptions: included actions per plan, overage rates, billing mix, and churn. When workflow-shaped features sit on a flat plan without usage guardrails, heavy users can compress margin — a pattern Bessemer's AI pricing playbook describes through unit-economics stress tests, not a single mandatory pricing metric.

Test pricing scenarios against the same usage assumptions — not a separate pricing spreadsheet. For the same usage distribution, compare included cap, overage rule, plan price, conversion, gross margin, and runway in one model. Raising the included allowance is not only a COGS question: it may change conversion, actual usage, heavy-user mix, retention, and margin together. The forecast should expose that trade-off — not only whether doubling a cap doubles median COGS.

How to model this in Stavia

Action-based AI cost only becomes useful when it connects to pricing, access model, acquisition, P&L, unit economics, and cash runway in one monthly forecast. Changing actions per user or plan caps should move COGS, margin, and ending cash in the same model — not in isolated calculators.

In Stavia Models, the workflow for action-based AI economics typically runs through these layers:

Pricing & access model: Define paid plans, free trial or freemium, conversion, and churn. Non-paid tiers are the first place action-based COGS often appears.
Generative AI APIs (COGS): Add features (chat, images, video, or unit-shaped usage), estimate cost per request or unit, set utilization, and assign plan-level caps. The engine scales requests by active subscribers and non-paid users separately — trial/free COGS flows to non-paid totals; paid usage flows to plan-level and unit economics reads.
Product usage APIs: For non-generative metered APIs (email, SMS, maps, KYC), use the same cap-and-utilization pattern when those actions also scale with behavior.
P&L and unit economics: Read company gross margin on the P&L view; read Generative AI APIs inside COGS per active subscriber on the unit economics view. Compare blended averages to plan-level detail when caps differ by tier.
Cash flow and runway: See when free and trial COGS plus acquisition spend compress ending cash — especially before paid MRR catches up.

For the full modeling system — how layers connect — see the startup financial modeling guide. For launch-specific generative feature setup, keep AI/API cost forecast as the companion layer for per-request estimation and plan stress tests.

Common mistakes

Final thought

AI product cost forecasting is a design discipline: pick the action that drives spend, model behavior by segment, and connect the result to pricing, margin, and runway. Provider prices change; the structure of your model should not depend on one rate card dated this quarter.

Founders who define the cost unit early — and stress heavy users and free tiers before scaling acquisition — can choose caps, pricing metrics, and funding plans with evidence. Founders who forecast from a single average often discover margin and runway pressure only after usage has already scaled.

Ready to model action-based AI cost in your forecast?

Connect usage assumptions, plan caps, COGS, unit economics, and runway in Stavia Models.

How to Forecast Generative AI API Costs Before You Launch an AI Feature

Launch-focused pass: one request, plan caps, and generative feature COGS before ship.

Which Startup Costs Should Sit in Cost of Revenue?

Place generative AI and usage APIs in COGS so gross margin and unit economics stay honest.

How to Model SaaS Pricing Before Launch

Align plan price, included usage, and overage with the behavior that drives inference cost.

How to Read Startup Unit Economics Without Fooling Yourself

Read ARPA, COGS per subscriber, and unit margin when action volume varies by customer.

How to Model an AI MVP Built in Days, Not Months

Post-MVP validation: when fast launch puts usage COGS in the first 90 days of the forecast.

About the author

Anastasiia Nikolaeva

Founder of Stavia Models

Anastasiia helps early-stage SaaS and AI founders build investor-ready financial models. She is the founder of Stavia Models and a startup finance consultant.

Work with Anastasiia