Google Quietly Upgrades Deep Think, ARC-AGI-2 Directly Hits 84.6%
Google Quietly Upgrades Deep Think, ARC-AGI-2 Directly Hits 84.6%
Just now, Google DeepMind upgraded Gemini 3's dedicated reasoning mode, Deep Think, and its scores have directly topped the charts.

You should know that ARC-AGI-2 is currently recognized as a cutting-edge benchmark for testing AI reasoning ability, and no model has previously achieved particularly good scores on it.

And the upgraded Deep Think scored 84.6%. For comparison: Claude Opus 4.6 scored 68.8%, GPT-5.2 scored 52.9%, and even Google's own Gemini 3 Pro Preview only scored 31.1%.
Huge improvement.
More Than Just Reasoning
Deep Think's ambition is clearly more than just reasoning.

On the Humanity's Last Exam benchmark, which is known as the "last exam for humanity", Deep Think scored 48.4%. This test covers the most difficult problems in mathematics, science, and engineering. Claude Opus 4.6 scored 40.0%, and GPT-5.2 scored 34.5%.
It's also very strong in programming:
On Codeforces, Deep Think reached Elo 3455, while Gemini 3 Pro Preview was 2512, and Claude Opus 4.6 was 2352.

In addition, on the MMMU-Pro benchmark for multimodal understanding and reasoning, Deep Think also leads with 81.5%, but the gap between the various models is not that large here: Gemini 3 Pro Preview 81.0%, GPT-5.2 79.5%, Claude Opus 4.6 73.9%.

In addition to the scores, Deep Think also achieved gold medal-level performance in the written part of the 2025 Physics and Chemistry Olympiad.
To Solve Scientific Problems
Google DeepMind specifically emphasized this time that the upgraded Deep Think is no longer just a problem-solving machine, but is intended to solve real-world scientific and engineering problems.

They showcased the case of Duke University's Wang Lab: Researchers are using Deep Think to design new semiconductor materials and optimize the growth process of complex crystals, which are candidate materials for high-temperature semiconductors.


There are also researchers in the field of mechanical engineering who are using it to iterate on physical prototypes, allowing hardware iteration to reach the speed of software iteration, which means faster improvement cycles in areas such as assistive devices.
How to Use
The upgraded Deep Think mode has now begun to be pushed to Google AI Ultra subscribers in the Gemini App.

For researchers and developers, Google has opened a Vertex AI early access program that can be used via API.
Vertex AI Early Access: https://goo.gle/4rMHUlq





