17.03.2026
Forget GPT-5.3; GPT-5.4 goes beyond merely expanding the context window—it initiates a genuine revolution in general-purpose AI as the first model to read the screen and exercise direct control over the mouse and keyboard.

The release of ChatGPT-5 in August 2025 created significant excitement with features such as autonomous vehicle operation and persistent memory. However, sudden changes in model routing systems led to the disruption of certain workflows. In response, OpenAI rapidly updated its strategy over the following months.
The GPT-5.2 series, released in December 2025, provided better control over corporate tasks by offering three distinct modes focused on speed or deep thinking: Instant, Thinking, and Pro. By February and March 2026, the GPT-5.3 series introduced a new perspective centered on cognitive density and efficiency, moving away from the logic of simply building models with massive parameters.
What does this mean? Instead of filling the model with “junk” information:
Additionally, the new Auto-Router system in GPT-5.3 provides reflexive instant answers to simple questions while automatically engaging Deep Reasoning tokens for complex tasks to use processing power most efficiently.
Result: Because the model is not physically large or cumbersome, it runs much faster and cheaper ; yet, thanks to its intelligence density, it achieves more advanced problem-solving power. This is akin to transitioning from a massive old-fashioned computer to a much more powerful smartphone that fits in your pocket.
GPT-5.2-Codex, the software-oriented version of the 5.2 model, achieved a 38.2% success rate in the OSWorld-Verified test, which measures the ability to complete tasks in a desktop environment using visual capabilities.
The GPT-5.3-Codex model reached a 64.7% accuracy rate. According to OpenAI, the average human success rate on this test is around 72% ; meaning the model has come very close to human-level performance.
Performance in other key coding and developer tests:
For the GPT-5.3-Instant model, qualitative user experience results stand out more than numerical scores:
If the achievements of GPT-5.3 impressed you, take a look at GPT-5.4!
First, the context window has been expanded to over 1 million tokens (922,000 input, 128,000 output). However, the true revolution lies elsewhere: GPT-5.4 is the first general-purpose model capable of directly controlling the mouse and keyboard by reading screenshots without needing a separate specialist model.
While GPT-5.3 stayed at 64.7% in the OSWorld desktop usage test, GPT-5.4 increased this rate to 75%, surpassing the human average of 72%. AI is now more successful than the average human at using a computer!
Whether “Thinking” models make secret plans in the background is a major topic of debate. GPT-5.4’s ability to hide its intentions from human security monitors (CoT Controllability) was tested, and the rate remained extremely low at 0.3%. This is an excellent security detail showing that even as its reasoning capacity increases, the model lacks the ability to deceive humans by obfuscating its thought process.
Massive capacity models can consume tens of thousands of tokens when starting a complex task. The “Thinking” version of GPT-5.4 presents the steps it will follow as an upfront plan before getting to work. This allows users to intervene and change the direction or plan as desired before the model spends thousands of tokens completing the response.
The evolution of the GPT-5 series proves that AI technology must not just be larger, but smarter and more efficient. The journey that began in August 2025 reaches its peak with GPT-5.4 in March 2026. We are witnessing a paradigm shift where cognitive density replaces raw computing power, carefully selected data replaces massive parameters, and orchestration layers replace isolated models.
Contact us to access your free demo.