DeepSeek Made Its 75% Price Cut Permanent. The Floor Just Fell Out of the Token Market.
DeepSeek V4 Pro now starts at $0.003625 per million tokens, runs a 1M context on Huawei silicon, and just turned a promotion into the new normal. What the price war means for everyone's API bill.
By Priya Raman · · 3 min read
DeepSeek's "promotional" 75 percent price cut was supposed to expire May 31. Instead, the company made it permanent: V4 Pro now runs $0.003625 to $0.87 per million tokens, down from $0.0145 to $3.48, with a one million token context window.
Anyone who has watched a retailer run a "limited time" sale knows what a promotion that never ends actually is. It's a price. The question worth your time is how DeepSeek can afford this price, and what it does to everyone else's.
The hardware story underneath
The economics rest on engineering. V4 Pro runs on Huawei Ascend 950 chips and reportedly operates at one quarter the per-token compute and one tenth the memory footprint of its predecessor. If those figures hold in practice, the 75 percent cut isn't predatory pricing in the dumping sense. It's cost pass-through from a company that found a fundamentally cheaper way to serve tokens and is spending the savings on market share.
That should worry the competition more than a subsidy would. Subsidies end. Cost curves compound.
Where the market now sits
The current per-million-token board, roughly cheapest to priciest:
| Model | Input | Output |
|---|---|---|
| DeepSeek V4 Pro | $0.003625 | $0.87 |
| Gemini 3.5 Flash | $1.50 | $9.00 |
| GPT-5 | $2.50 | $10.00 |
| Claude Opus 4.7 | $5.00 | $25.00 |
Lined up like that, the spread looks absurd: three orders of magnitude between the cheapest input tokens and the priciest. But a price war this lopsided doesn't hit everyone equally, and the damage map is worth drawing.
The frontier tier is mostly insulated. Teams paying Opus or GPT-5 rates are buying capability that doesn't have a cheap substitute, plus things DeepSeek can't sell many of them at any price: data governance their counsel will sign off on, and no geopolitical exposure. For US enterprises and anyone touching government work, a Chinese-hosted model is off the table regardless of cost, and they know it.
The squeeze lands on the middle. Startups, agencies, and indie developers running high-volume, low-stakes workloads, the classification, extraction, and summarization that makes up most production token traffic, now have a defensible reason to route it at a 99 percent discount. Gemini Flash's $1.50 input price was positioned as the value play three weeks ago. Against $0.003625 it reads like a luxury good.
What I'd actually do with this
Practical notes, since this is ultimately a procurement story:
The honest move is to benchmark, not assume. Run your actual workload against V4 Pro and your incumbent, score outputs blind, and let the quality delta argue with the price delta. For a lot of mid-tier tasks the quality gap has gotten small while the price gap got enormous.
Then weigh the non-price column: where your data goes, what your customers' contracts say, and what happens to your unit economics if this price proves unsustainable and snaps back. Cheap tokens you can't ship to production are worth exactly zero.
A different kind of price war
Every previous round of this price war was someone matching someone else's discount. This round is different in kind: a structural cost advantage, on non-NVIDIA silicon, converted into a permanent floor price. The labs can choose not to follow DeepSeek down, and the premium tier probably shouldn't. But the era of charging mid-market prices for mid-market intelligence ended on May 23, and GitHub's new metered billing means more developers than ever are watching the meter.
The cheapest token on the market is now effectively free. Plan your architecture, and your vendor's pricing power, accordingly.
enjoyed this one?_
Get the next one in your inbox.
One email every morning. The AI news that matters, decoded in five minutes.
up_next → Tools
ChatGPT's New Memory Actually Remembers You Now. That's the Feature and the Problem.
OpenAI's Dreaming V3 memory update lifts preference accuracy from 55% to 71% and keeps context fresh automatically. It also trims the audit trail, and that trade deserves more attention.