Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- Rising DRAM costs and more verbose chatbots will drive up prices.
- The industry seeks to mitigate costs with more efficient models.
- Users need to prioritize projects and consider polite prompting.
Whether you’re a user of an AI chatbot or a developer utilizing large language models to build apps, you’ll probably pay more for the technology this year. Thankfully, there are steps you can take to mitigate the cost.
We’re living in a token economy. Each piece of content — words, images, sounds, etc. — is treated by an AI model as an atomic unit of work called a token. When you type into a prompt in ChatGPT, and you receive a paragraph in response, or you call an API to do the same thing inside an app you’ve built, both the input and the output data are counted as tokens.
Also: The sneaky ways AI chatbots keep you hooked – and coming back for more
As a result, the meter is always running when you use AI, racking up costs per token, and the total bill is set to go higher in aggregate.
Rising chip costs
The most immediate reason for rising prices is the increasing cost — incurred by OpenAI, Google, Anthropic, and other operators of AI services — of building and running AI’s underlying infrastructure. As their costs go higher, so must the price of AI.
The highest cost is the DRAM memory chips used to ingest input tokens. To hold the tokens in memory and store them for later use requires an increasing amount of DRAM.
Also: Here’s every AI subscription I paid for in 2025 – and which ones I’m taking into 2026
A supply crunch for DRAM chips, driven by the insatiable build-out of AI infrastructure, is driving up prices for the chips by 20%, year over year, with no end in sight. Costs are rising even higher due to the most cutting-edge memory for AI, known as HBM, or high-bandwidth memory.
“The gap between the demand and supply for all of DRAM, including HBM, is really the highest that we have ever seen,” Sanjay Mehrotra, CEO of Micron Technology, one of the biggest DRAM makers, told Wall Street analysts last month.
That chip inflation will be felt principally by giants such as Google building the AI services, such as Gemini, but they’ll undoubtedly pass along the rising costs to users.
It’s not just DRAM, either. Many data centers are increasingly built with NAND flash chips, the same type used in your smartphone to store data on a long-term basis. They’re also surging in price, Micron’s CEO said.
The need to monetize
The second factor driving prices is that AI providers not only need to pass along the current cost of running the services, with higher DRAM and NAND costs; they also need to justify years of future investment they have outlined.” That has already led to price increases. With its flagship GPT-5.2 model, for example, OpenAI increased the price charged to developers from $1.25 per token of input for the former GPT-5.1 model to $1.75, a 40% per-token price hike.
OpenAI is under the greatest pressure to demonstrate it can monetize AI, given that it is currently losing money and has committed to over a trillion dollars in spending on AI. But the same pressure exists for Google and others.
Licensing copyrighted content
A third factor is the gradual emergence of content deals to secure rights to copyrighted material. AI models have been based on content scraped from the internet. Following numerous lawsuits against AI model creators, a partnership model is emerging in which model creators will license their content.
Also: The most exciting AI tech I’ve tried at CES 2026 so far (including a cleaning robot)
The most prominent example is OpenAI’s deal, announced last month, with Disney to license over 200 characters from Disney, Marvel, Pixar, and Star Wars for use in short-form videos created by OpenAI’s Sora video-generation AI model. The deal includes Disney taking a billion-dollar stake in OpenAI and becoming an AI customer to OpenAI, but that alone may not pay whatever royalties are being arranged for Disney in the actual agreements, which were not disclosed by either party.
More such deals may happen as Disney and others pursue what they view as infringement of their rights. For example, Disney ordered Google last month to cease and desist, claiming “massive scale” copyright infringement that involved using AI to “exploit and distribute” Disney’s content, according to Variety magazine.
(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
Pricier and pricier access
A fourth factor driving up costs is a token count that continues to rise due to a combination of more complex AI model designs, individual users tasking chatbots with more complex requests, and enterprises putting AI models into production.
The AI models themselves are becoming more verbose, meaning they produce more output in response to each prompt, on average, especially for reasoning models, which will produce extensive explanations as part of the prompt output. While verbose output doesn’t change the per-token price, it means the meter rises faster for developers who are using APIs priced per token.
Also: AI killed the cloud-first strategy: Why hybrid computing is the only way forward now
User habits can also drive up costs. As more people use chatbots daily, they are likely to become more comfortable inputting long documents to request analysis.
Again, for the average chat individual user on a subscription, that doesn’t change the monthly subscription price. However, it can encourage users to upgrade to more expensive plans.
The Pro version of ChatGPT, for example, is $200 per month, versus $20 for the base Plus subscription. Google’s Gemini Ultra is priced at $250 per month — again, multiples of the $20 Gemini Pro version.
The inference shift
A broader change poised to drive costs overall is the deployment of inference — the generation of actual predictions — into production. Training an AI model has a relatively predictable budget because it’s a contained experiment. All that changes when a company wants to really use AI on an ongoing basis.
Like consumers, corporate users of AI models will follow the trend of doing more and asking more, and, thereby, paying for more tokens of input and output.
The use of AI agents, which automatically generate more input and output as they operate, will create a level of token generation that has not yet been thoroughly explored. Remember, the meter is running, and costs only rise in aggregate as the meter continues to run.
Also: True agentic AI is years away – here’s why and how we get there
A report released in November by Chinese AI giant ByteDance described how agents can significantly increase the amount of tokens consumed.
“The token cost of an agentic interaction can grow faster than linearly with the number of turns,” the authors relate. “In every single typical agentic loop, the entire conversation history, including all previous prompts, tool calls, and their output, is fed back into the LLM as context for the next turn.”
The ByteDance paper concludes that the number of tokens rises as the square of the number of rounds of API access by an agent, “causing computational and financial costs to escalate rapidly.”
What’s being done to mitigate costs
The chip world, the same world that is enjoying the surging price of DRAM and NAND, is trying to avoid killing the proverbial goose by making the price too high for anyone.
To that end, most chip vendors, such as Nvidia, will advertise greater throughput of tokens, the amount that can be transmitted in, say, a second of average use.
For example, Nvidia CEO Jensen Huang — speaking this week at CES 2026 in Las Vegas — discussed the company’s forthcoming Rubin GPU chips and Vera CPU chips, which are set to go on sale later this year. Huang promised that Rubin “can deliver up to a 10x reduction in inference token costs” by processing them all faster, as ZDNet’s Cesar Cadenas relates.
Also: Why Nvidia’s new Rubin platform could change the future of AI computing forever
For a cloud provider like Google, that may mean a more efficient use of infrastructure. However, for the end user of AI, increasing the number of tokens processed per second doesn’t necessarily mean using fewer tokens; it simply means the meter is running faster.
Rubin may help things, but the real cost issue today is not processing tokens (the math that Rubin does); it’s the rising cost of DRAM and NAND to store all those tokens.
Steps are also being taken by model developers to make the inner workings of AI models more efficient. DeepSeek AI surprised everyone last year with a more efficient version of its technology, reducing the cost of running it.
Notably, DeepSeek AI’s forthcoming model update is expected to focus on DRAM memory savings, a reflection of the prominence of memory and storage issues.
Also: DeepSeek may be about to shake up the AI world again – what we know
When it comes to the spiraling cost of inference and agentic workflows, major software vendors may develop ways to help their customers.
We’ve seen in the past where consumption-based pricing for SaaS software led to dramatic spikes in spending by companies. Vendors such as Snowflake had to help their corporate users who experienced sticker shock.
Snowflake’s approach was to identify ways to help customers reduce variable costs, such as those associated with data preparation and storage. You can expect 2026 will see similar instances of vendors trying to limit the damage to their AI customers by helping them plan their usage and monitor costs.
3 ways to save money
There’s nothing you can personally do about rising semiconductor prices. However, there are steps you can take to improve your use of the technology.
1. Comparison shop
You can find very general comparisons by typing something like “What can I get as a basic paid plan among the top AI model service providers?” into any of the chatbots.
I tried that with Gemini, ChatGPT, and Anthropic’s Claude, and all did a reasonably good job of giving back quotes for their own and others’ subscription offers. Perplexity also comes up as one of the common paid plans. I found Anthropic’s way of listing the comparisons the most well-organized, but, as with all things AI, your results may differ.
Also: The best AI chatbots of 2026: I tested ChatGPT, Copilot, and others to find the top tools now
Most vendors have buried the latest pricing plans for developers in their documentation. For example, here’s an API pricing page for Gemini from Google. OpenAI has a similar page for its API access. Overall, pricing is not transparent for these services, leading researchers to suggest that the government needs to step in to set policy that would at least require transparency of pricing.
Keep in mind that because of the varying abilities of the AI models, the per-token pricing can’t be an apples-to-apples comparison. One model’s simple answer to your question can become another model’s verbose answer that drives up the total cost.
2. Live on a budget
Be selective about what you upload and how many responses you want to receive. If you’re an individual user of a chatbot, and you don’t want the cost of a paid subscription, a free bot may be fine for your needs.
As a business, try prioritizing. Projects that involve inputting a lot of corporate data and getting back verbose responses may need to be reconsidered based on per-token pricing. Some projects just may not be worth it, while others may be justified if there is a specific goal they serve to reduce costs or boost corporate revenue.
If, as ByteDance researchers argue, agents increase token consumption by the square of the number of times an agent performs an action, then companies may need to temper agentic deployments. That could mean prioritizing tasks that promise a realistic return on investment; for example, the time saved by human coders.
Also: I test AI for a living, and these 3 free tools are the ones I use most
The ByteDance paper recommends a more nuanced strategy: Limit agents to a maximum number of “turns,” meaning the number of times the API is accessed. Setting limits on the number of turns, they write, may force the agent program itself to be more frugal with the tokens it consumes, such as via retrieval-augmented generation (RAG).
Some commercial packaged software may prove more economical than direct API access. However, every package either adds an extra cost for AI, such as Microsoft’s Co-Pilot in Office 365, or charges for higher tiers that include AI use. The vendors, too, you see, have to make back their cost to develop AI.
Yet another consideration is what tasks can be done in batch form. When using an API, most providers offer lower per-token prices to process a whole bunch of input and output tokens on an overnight basis. Batch mode doesn’t immediately return a prediction at inference, but it’s suitable for less time-sensitive projects.
3. Be polite to your bot
The verbose output of AI models is the most daunting cost factor, given that end users have no direct control over output tokens.
It turns out, however, there are indirect ways to exert control.
A surprising technique backed by academic research is to be polite to the chatbot. My colleague David Gewirtz has written about the ethical virtues of writing to chatbots in a polite tone of voice. There are also economic reasons.
Researchers at the University of Iowa, in November, used OpenAI’s API to study how slight changes in the way a prompt is phrased affected the number of tokens generated by ChatGPT.
Also: I’ve studied AI for decades – why you must be polite to chatbots (and it’s not for the AI’s sake)
The authors compared 20,000 actual English-language prompts and their responses gathered from GPT-4 interactions. They analyzed the language used, discerning whether it had explicit politeness, such as “please and thank you,” or implicit politeness, such as “could you” or “would you,” in the prompt.
They then tested what happens when the prompt is turned into its opposite, such as “Write a critique…” with no “please” included.
“We find that non-polite prompts lead to higher token generation compared to polite prompts,” they wrote. Specifically, non-polite prompts generate “more than 14 extra tokens” versus a polite version, using ChatGPT-4.
That excess token use is “equivalent to $0.000168 extra cost per prompt on average,” they found. “Subtle linguistic features can systematically affect how much an enterprise pays,” they concluded, and the excess adds up dramatically:
“The average daily queries to OpenAI’s API exceed 2.2 billion. Compared to a scenario in which all prompting is polite, when instead the prompts are non-polite, this generates an additional $369K revenue per day, simply due to the increase in tokens that non-polite prompts generate in the outcome. This is equivalent to a monthly revenue of $11M for OpenAI (which is roughly 3% of its total revenue).”
The authors don’t know why it is phrases like “could you” and “please” cause fewer tokens. It’s just one of those idiosyncrasies that make the pricing of AI not transparent.
At least you know, adding a touch of politeness may be the simplest thing you can do to grapple with the ever-rising cost of AI.