OpenAI launches GPT-5.4, its most "capable" model yet with Thinking and Pro variants
OpenAI has announced GPT-5.4, which the company calls its the most capable and efficient frontier model yet for professional work. The new model drops with multiple versions designed for different workloads, including a standard model, a reasoning-focused variant called GPT-5.4 Thinking, and a high-performance option labeled GPT-5.4 Pro.
The announcement also highlights major upgrades under the hood. For developers, the API version of GPT-5.4 now supports context windows as large as 1 million tokens, easily the largest context capacity OpenAI has offered so far. The company says the new model can solve the same problems while using significantly fewer tokens than GPT-5.2, pointing to notable efficiency gains for large workloads.
Benchmarks show major gains for knowledge work
OpenAI says GPT-5.4 provides strong results across several evaluation benchmarks focused on real world tasks. The model reportedly achieved record scores in OSWorld-Verified and WebArena Verified, both of which measure computer interaction capabilities. It also scored 83 percent on OpenAI’s GDPval benchmark, a test designed to measure performance on knowledge work tasks.

Another benchmark result comes from Mercor’s APEX-Agents evaluation, which focuses on professional skills in areas such as finance and law. GPT-5.4 apparently also performs specifically well when creating long form deliverables, including slide decks, financial models, and legal analysis.
OpenAI also says the new model improves reliability. Compared with GPT-5.2, GPT-5.4 is 33 percent less likely to make incorrect claims, while overall responses are 18 percent less likely to contain factual errors.
Alongside the model launch, OpenAI has also updated how the GPT-5.4 API handles tool calling, introducing a new system called Tool Search. Rather than loading definitions for every tool in system prompts, the model can now retrieve tool details only when needed. This change helps reduce token usage and can make large systems faster and cheaper to run.
Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more
User forum
0 messages