How to Forecast AI API Costs Before You Launch an AI Feature

The right question is not only what a model costs per call. It is whether your pricing, plan caps, and margins can support the feature before you ship — modeled from tokens or units, utilization, and trial or free usage, not from a guessed monthly vendor line.

By Anastasiia Nikolaeva

Why founders get AI API cost wrong

Most founders do not ignore AI API cost. They just think about it too late.

The usual pattern is familiar. A team wants to add an AI feature, checks the provider pricing page, sees that one prompt or one generated asset looks cheap enough, and moves on. The real modeling happens later, when the first meaningful bill arrives. That is understandable at prototype stage. It is a weak way to plan a feature that will sit inside a paid product.

The problem is that AI cost rarely breaks because of one request. It breaks because a request becomes a feature, the feature becomes part of a plan, and the plan gets used by real customers with real behavior. Once that happens, the economics depend on more than the provider's rate card. They depend on how the feature works, how often people use it, how much usage each segment gets, and whether your subscription price can carry that promise.

That is why the right founder question is not only "What does this model cost?" It is "Can this feature work inside my product before I ship it?"

Start from the feature, not the vendor bill

A lot of early-stage models begin with one line like "OpenAI cost" or "AI vendor cost." It looks practical, but it hides the real logic.

A stronger way to model AI API cost is to start from the feature itself. What exactly is the user doing? Are they chatting with an assistant, generating an image, creating a video clip, rewriting text, or summarizing a document? Until that is clear, the cost assumption is too abstract to help with product decisions.

This matters because founders are not really buying tokens in the abstract. They are designing a product experience and deciding how much of that experience to include inside each plan. The feature is the commercial unit that matters. Once you model it that way, the financial questions become much sharper. Which plans should include it? How generous should the allowance be? Should trial users get access? Is the current subscription price high enough to support it?

Those are pricing and margin questions as much as infrastructure questions.

What you need to estimate before you model AI API cost

You do not need invoice-level precision to make a good early-stage decision. You need a realistic first pass on five things.

  1. 1

    The feature itself

    What exactly is the user doing: chatting with an assistant, generating an image, creating a video clip, summarizing a document, or something else?

  2. 2

    One average request

    What does one normal use of the feature look like? For chat, that usually means one prompt plus one response. For image or video, it may mean one generated asset or one generated clip.

  3. 3

    Monthly usage rhythm

    How often will a real user use this feature in a month? Think in sessions and actions, not just “heavy” or “light” usage.

  4. 4

    Access and caps

    Who gets the feature, and how much of it? Free users, trial users, Basic, Pro, team plans — each can have a different allowance.

  5. 5

    Pricing and margin fit

    Once the feature is inside a plan, does the subscription price still support the cost? This is the real founder question.

Stavia Inputs, Costs, COGS: Generative AI APIs with AI Chat on tokens, cost per action, and usage caps by plan
Example setup: one AI chat feature translated into cost per request, then into plan-level monthly usage assumptions.

Define one request

This is the place where many articles become too technical or too vague. We want the middle ground.

For financial modeling, you do not need to reproduce every line of a provider invoice. You need a realistic estimate of one average use of the feature.

Take AI chat as the simplest example. One average request usually means one user message plus one model response. In cost terms, that is the easiest way to think about input and output. Input is what goes into the model for that interaction: the user's prompt and whatever context or instructions you typically send with it. Output is what comes back from the model in the response.

You do not need perfect token science at this stage. You need a practical estimate of what a normal request looks like in your product. If your feature usually involves short prompts and concise answers, your average request will look very different from a research assistant that works with long context and long responses. The same principle applies outside chat. For image generation, one request may be one generated image. For video, one request may be one generated clip of a certain length.

This is also where it is worth being explicit about simplification. Real provider pricing can be more detailed than an early-stage model. There can be caching, different SKUs, volume effects, or other pricing branches. For a founder deciding whether a feature basically fits the business model, it usually makes more sense to start with a conservative simplified request cost and refine it later than to wait for perfect precision.

Scale through usage

Once one request is concrete, the next question is not "What will our monthly vendor bill be?" The next question is "How will real users consume this feature?"

That is where many startup models still stay too thin. They jump from cost per request to a rough total spend estimate without thinking about usage rhythm. A more useful approach is to picture how the feature behaves in the product. How many times might a user open it in a month? How many requests happen in one session? How different is free usage from paid usage? Which plans should have higher limits because the feature is part of the value proposition, and which plans need tighter guardrails?

This is the step where AI cost becomes a product-design question. The same request cost can be harmless or dangerous depending on how much usage you include. A cheap request does not automatically mean a safe feature if the cap is too generous. And a more expensive request can still work well if it is placed in the right tier with the right allowance.

That is why the model should scale cost through user segments, monthly caps, and realistic utilization instead of through one blended average.

AI chat example

A practical way to think about this is an AI assistant inside a subscription product.

Imagine the feature is useful enough to support conversion and retention, but not so core that every plan should get unlimited access. The founder wants a small allowance for free users, a more substantial limit for Basic, and a larger limit for Pro. The goal is not simply to "offer AI." The goal is to offer it in a way that still works economically inside the pricing ladder.

This is where the model becomes much more useful than a rough vendor estimate. Instead of asking whether the provider is affordable in general, the founder can ask whether the current product design is affordable. Is the free allowance small enough? Is Basic too generous for the price? Does Pro have enough room to justify the higher subscription? Is usage likely to be low, medium, or intense in practice?

The illustration above is one concrete version of that logic: one feature, one average request, then plan-level caps and expected use — without getting lost in every field on the screen.

Caps and free use

Founders often focus on the model choice and not enough on the cap.

But in a subscription product, the cap is where the economics become real. It is the point where provider pricing turns into a commercial promise. Once you say that a free user gets some amount of AI access, a Basic customer gets more, and a Pro customer gets even more, you are no longer pricing infrastructure. You are pricing a product promise.

That is why free and trial usage deserve explicit treatment in the model. A feature can look cheap when you think about one request in isolation and much less comfortable once you allow meaningful pre-paid usage at scale. Early-stage products often feel pressure to make the AI feature visible before payment because it helps onboarding and conversion. That can make sense. But it should be modeled honestly, not hidden inside one average customer assumption.

The same logic applies to paid tiers. A founder may discover that the feature works well in higher plans but is too generous in the entry plan. Or that the free allowance should exist, but stay deliberately small. These are exactly the decisions the model should help surface before launch.

For how access mode shapes when that pre-paid usage shows up in the funnel, see Free Trial vs Freemium.

Margin and burn

A lot of discussions about AI feature cost stop at the vendor bill. That is not where the important founder decision ends.

The more useful question is what this feature does to the rest of the business. Once AI cost is modeled properly, it becomes part of COGS. That means it affects gross margin. And once gross margin changes, the implications reach further: contribution changes, the amount of cash the business keeps from revenue changes, and the room the company has to fund growth becomes tighter or looser.

This is why AI API cost should not be treated as a side calculation. It belongs in the operating model. A founder should be able to see not only that the feature "costs something," but whether it still leaves enough room inside the business model for the company to grow in a healthy way.

Monthly Forecast: Generative AI APIs in COGS with costs by plan and feature over time
Once the feature is modeled properly, AI API cost stops being a side note and becomes part of the company's real cost structure.

Where unit economics make the decision clearer

The COGS view tells you that AI cost exists. Unit economics tell you whether it still makes sense.

This is often the most useful place to judge the feature. Once AI API cost sits next to ARPA, contribution, and other per-subscriber costs, the founder can stop thinking in terms of abstract spend and start thinking in product economics. Is the feature still a manageable part of the plan? Is it small enough to support the current price? Is it quietly consuming too much of the value the subscription is supposed to create?

This is also where trade-offs become clearer. A feature may be viable in Pro and uncomfortable in Basic. A chat experience may fit the current pricing ladder while video does not. Or the feature may be worth keeping, but only with tighter caps or lower expected usage in lower tiers.

Unit Economics: per-subscriber COGS including Generative AI APIs, contribution, and margin
Unit economics make the feature easier to judge as part of the subscription, not just as a vendor bill.

Video and other features

The same modeling structure still works when the feature changes.

With video, image generation, or other usage-based AI features, the "one average request" is not necessarily prompt plus response anymore. It may be one generated asset, one render, one clip, or one number of seconds. But the planning logic stays the same. First estimate the cost of one normal action. Then decide who gets access, what the monthly cap looks like, and how much of that cap people are likely to use.

This matters because more expensive modalities can break the entry tier surprisingly quickly. A feature that feels like a strong differentiator can still be the wrong decision for a low-priced plan if the included allowance is too generous. That does not mean the feature is bad. It usually means the packaging needs more work.

Common mistakes

In Stavia

In Stavia, the cleanest workflow is to build one AI feature at a time.

Start by defining one average action. For chat, that means the typical prompt and response. For image or video, it means the typical generated unit. Then decide how that feature should be packaged across your product: what free users get, what paid plans get, and what utilization is realistic for each segment.

After that, do not stop at the input screen. Read the feature back through the model. Look at how it lands in COGS, then look at the unit economics. The point is not just to configure the feature. The point is to see whether your current pricing and plan structure can support it.

This is where Stavia is useful as an implementation layer. It takes the logic you would want in a serious early-stage model and keeps it connected to the rest of the business instead of leaving it in a side spreadsheet.

How this sits next to pricing and plan design is the same story: the forecast only works when revenue assumptions and variable cost assumptions are allowed to talk to each other.

Conclusion

AI API cost becomes dangerous when founders treat it as a technology detail instead of a product decision.

The feature is what matters. Then the request. Then the way real users consume it. Then the cap, the plan, and the margin that has to carry it.

That is the level where financial modeling is useful before launch. Not because it predicts usage perfectly, but because it forces the right questions early enough to change the product, the packaging, or the price.

Test AI feature economics in your forecast

Start a free trial in Stavia Models: define one average action, caps by segment, and read generative AI through COGS and unit economics before you launch.

About the author

Anastasiia Nikolaeva

Anastasiia Nikolaeva

Founder of Stavia Models

Anastasiia Nikolaeva is a financial modeling consultant and the founder of Stavia Models. She has built financial models for SaaS, AI, marketplace, and other startup business models, helping founders plan pricing, growth, fundraising, and unit economics. Stavia Models is based on this hands-on consulting experience and turns that modeling logic into a guided product.

Consulting services and templates