Google Quietly Upgrades Deep Think, ARC-AGI-2 Directly Hits 84.6%

Just now, Google DeepMind upgraded Gemini 3's dedicated reasoning mode, Deep Think, and its scores have directly topped the charts.

Deep Think

You should know that ARC-AGI-2 is currently recognized as a cutting-edge benchmark for testing AI reasoning ability, and no model has previously achieved particularly good scores on it.

Benchmark Comparison

And the upgraded Deep Think scored 84.6%. For comparison: Claude Opus 4.6 scored 68.8%, GPT-5.2 scored 52.9%, and even Google's own Gemini 3 Pro Preview only scored 31.1%.

Huge improvement.

More Than Just Reasoning

Deep Think's ambition is clearly more than just reasoning.

More Than Just Reasoning

On the Humanity's Last Exam benchmark, which is known as the "last exam for humanity", Deep Think scored 48.4%. This test covers the most difficult problems in mathematics, science, and engineering. Claude Opus 4.6 scored 40.0%, and GPT-5.2 scored 34.5%.

It's also very strong in programming:

On Codeforces, Deep Think reached Elo 3455, while Gemini 3 Pro Preview was 2512, and Claude Opus 4.6 was 2352.

Codeforces

In addition, on the MMMU-Pro benchmark for multimodal understanding and reasoning, Deep Think also leads with 81.5%, but the gap between the various models is not that large here: Gemini 3 Pro Preview 81.0%, GPT-5.2 79.5%, Claude Opus 4.6 73.9%.

MMMU-Pro

In addition to the scores, Deep Think also achieved gold medal-level performance in the written part of the 2025 Physics and Chemistry Olympiad.

To Solve Scientific Problems

Google DeepMind specifically emphasized this time that the upgraded Deep Think is no longer just a problem-solving machine, but is intended to solve real-world scientific and engineering problems.

Scientific Problems

They showcased the case of Duke University's Wang Lab: Researchers are using Deep Think to design new semiconductor materials and optimize the growth process of complex crystals, which are candidate materials for high-temperature semiconductors.

Duke Case

Mechanical Engineering

There are also researchers in the field of mechanical engineering who are using it to iterate on physical prototypes, allowing hardware iteration to reach the speed of software iteration, which means faster improvement cycles in areas such as assistive devices.

How to Use

The upgraded Deep Think mode has now begun to be pushed to Google AI Ultra subscribers in the Gemini App.

How to Use

For researchers and developers, Google has opened a Vertex AI early access program that can be used via API.

Vertex AI Early Access: https://goo.gle/4rMHUlq

Google Quietly Upgrades Deep Think, ARC-AGI-2 Directly Hits 84.6%

Google Quietly Upgrades Deep Think, ARC-AGI-2 Directly Hits 84.6%

More Than Just Reasoning

To Solve Scientific Problems

How to Use

You Might Also Like

Claude Code Buddy Modification Guide: How to Obtain Shiny Legendary Pets

Obsidian Launches Defuddle, Taking Obsidian Web Clipper to New Heights

OpenAI Suddenly Announces 'All-in-One': Browser + Programming + ChatGPT Merge, Internally Admits Mistakes Over the Past Year

2026, No More Forcing Myself to be 'Disciplined'! Do These 8 Simple Things, and Health Will Naturally Follow

Moms Who Work Hard to Lose Weight but Can't, Definitely Fall Here

AI Browser 24-Hour Stable Operation Guide