The smallest models are carrying the biggest bets
OpenAI, Mistral, and Nvidia all shipped their smallest models this week — and far from being a concession, these tiny models are the linchpin of the agent revolution. GPT-5.4 nano costs pennies per million tokens. Mistral Small 4 activates just 6B of its 119B parameters per query. Nvidia rallied eight AI labs to build open models that run on desktop hardware. The industry is collectively betting that the next phase of AI depends not on building bigger brains, but on making intelligence cheap and fast enough to run everywhere, all the time, which is exactly what the agent era requires.
OpenAI
OpenAI ships GPT-5.4 mini and nano — its cheapest models yet hit free ChatGPT
OpenAI released GPT-5.4 mini and nano on March 17, bringing near-frontier reasoning to free ChatGPT users and offering API developers a nano model at $0.20 per million input tokens — designed explicitly to power sub-agents inside larger AI systems.
openai.com
The race everyone watches is at the top of the capability curve. Bigger models, higher benchmark scores. But this week, the most consequential releases were at the bottom.
OpenAI shipped GPT-5.4 mini and nano, bringing near-frontier reasoning to free ChatGPT users and, more importantly, pricing nano at $0.20 per million input tokens. That's not a rounding error on a research budget. That's a price point where you can run thousands of lightweight agents inside a single workflow and keep the bill under control. OpenAI is explicit about the intent: nano is built for "the sub-agent era," where the unit of deployment isn't one smart model but dozens of cheap ones coordinated by something larger.
Mistral made a parallel move. Simon Willison covered the release of Mistral Small 4, a mixture-of-experts model with 119B total parameters that activates only 6B per query. It unifies reasoning, multimodal, and agentic coding capabilities in a single model, runs 40% faster than its predecessor, and ships under Apache 2.0. That last detail matters: full commercial use, no strings attached. Mistral is betting that the model powering your agent swarm should be something you can self-host, modify, and ship without a licensing call.
Then there's Nvidia, which announced the Nemotron Coalition at GTC. Eight AI labs including Cursor, LangChain, Mistral, and Perplexity will co-develop open frontier models that run on desktop hardware. The coalition members aren't chosen randomly. Cursor and LangChain contribute coding and agentic evaluation benchmarks, while Mistral provides the model architecture. The whole thing is designed to produce models optimised for the work agents actually do, not for leaderboard bragging rights.
Why small is the strategy
The pattern is worth stating plainly: three of the biggest infrastructure players in AI all shipped their smallest models in the same week. That's not a coincidence and it's not a retreat. It's a bet on where the volume will be.
Consider the economics. A single GPT-5.4 query might cost a fraction of a cent, but an agentic workflow that makes 500 model calls to complete one task needs each call to cost nearly nothing. At $0.20 per million tokens, nano makes that arithmetic work. At 6B active parameters per query, Mistral Small 4 makes the latency work. These aren't stripped-down models for budget customers. They're purpose-built for a world where AI systems call other AI systems hundreds of times per task.
This is the infrastructure shift that matters more than any single capability improvement. The agent era requires models that are fast enough to run in loops, cheap enough to run in parallel, and small enough to run everywhere. Desktop and edge hardware. Inside other models' reasoning chains. The constraint isn't intelligence anymore. It's unit economics.
The question for anyone building on top of these models is whether the quality holds. GPT-5.4 mini reportedly approaches the full GPT-5.4 on SWE-Bench Pro and OSWorld while running twice as fast. Mistral Small 4 handles 3x more queries per second than its predecessor. If those numbers hold up in production, the trade-off between capability and cost just got a lot more favourable.
I think we'll look back at this week as the moment the industry collectively decided that the next phase of AI isn't about building bigger models. It's about making the small ones good enough that you can run them everywhere, all the time, without thinking about the bill. The real question is what gets built once that constraint disappears.
Read the original on OpenAI
openai.com