The Cost Wall of GPT

Over the past week, a new focus has emerged in the discussion about GPT on X: not capability, but cost.

ARC-AGI: The Boundary of Intelligence

The performance of the most cutting-edge models on ARC-AGI-2:

Model	ARC-AGI-2 Score
GPT-5.2 Pro	~54%
GPT-5.2 Refine	~73%
Human	100%

The gap between 54% and 73% is not an intelligence issue, but "refinement"—allowing the model to repeatedly check its own answers. This requires more computation, meaning higher costs.

The Real Cost of Agents

Annual cost of 24/7 enterprise-level agents (20 million input + 20 million output tokens per day):

Model	Annual Cost
Palmyra X5	~$48K
GPT-5.2 Standard	~$57K
Gemini 2.5 Pro	~$82K
Claude Sonnet 4.5	~$131K
Claude Opus 4.6	~$219K
GPT-5.2 Pro	~$690K

GPT-5.2 Pro is 12 times more expensive than GPT-5.2 Standard. This is not a pricing strategy issue, but a cost structure issue.

"Before you deploy 100 AI agents, run the math." — @waseem_s

The New Turing Test

A simple question is becoming the new intelligence test:

"The car wash is 40 meters from my house. I want to wash my car. Should I walk or drive?"

Models that passed: GPT-5.2 Thinking, Opus 4.6, Gemini 3 Pro Models that failed: GPT-5.2 Instant, GPT-4o, Haiku 4.5, Sonnet 4.5

Why is this test meaningful? Because it tests "common sense reasoning" rather than "knowledge retrieval." 40 meters is walking distance. The car is dirty and needs washing. But you wouldn't drive a dirty car 40 meters to wash it—unless you lack common sense.

History Doesn't Repeat, But It Rhymes

"Expert systems were born in the 1970s, flourished in the 1980s, and were widely regarded as the future of AI." — @ChombaBupe

GPT models were born in 2018, flourished in the 2020s, and are widely regarded as the future of AI.

The failure of expert systems was not because they weren't smart enough, but because the maintenance cost was too high and the scalability was too poor. When the knowledge base requires manual maintenance, scale is the enemy.

GPT faces a mirrored problem: the models are smart, but the cost of reasoning is too high. When every request requires a lot of computation, scale is also the enemy.

Next Steps

Multiple new models are expected to be released this week: Gemini 3.1 Pro, Claude Sonnet 5, GPT-5.3, DeepSeek V4, Qwen 3.5.

The competition is shifting from "who is smarter" to "who is cheaper." This is good news for users. For OpenAI? Not necessarily.

The Cost Wall of GPT

ARC-AGI: The Boundary of Intelligence

The Real Cost of Agents

The New Turing Test

History Doesn't Repeat, But It Rhymes

Next Steps

You Might Also Like

Claude Code Buddy Modification Guide: How to Obtain Shiny Legendary Pets

Obsidian Launches Defuddle, Taking Obsidian Web Clipper to New Heights

OpenAI Suddenly Announces 'All-in-One': Browser + Programming + ChatGPT Merge, Internally Admits Mistakes Over the Past Year

2026, No More Forcing Myself to be 'Disciplined'! Do These 8 Simple Things, and Health Will Naturally Follow

Moms Who Work Hard to Lose Weight but Can't, Definitely Fall Here

AI Browser 24-Hour Stable Operation Guide