OpenAI marks its 10th anniversary by unveiling its most powerful model to date, GPT-5.2. Sam Altman states: "We will build superintelligence within a decade."

Early this morning, at the very momentOpenAI10th Anniversary BirthdayOpenAI officially launches its most powerful model to date, GPT-5.2.and simultaneously launched ChatGPT and its API system.

This update includesGPT-5.2 Instant, Thinking, and Pro versionsStarting today, this feature will be gradually rolled out to users of Plus, Pro, Business, and Enterprise paid plans.Free and Go users are expected to gain access tomorrow.At the same time, GPT-5.2 has also been incorporated.API and CodexFor developers to call.

OpenAI十周年推出其迄今最强模型GPT-5.2，奥特曼：十年后将构建出超级智能

The existing GPT-5.1 will continue to serve as the foundation for ChatGPT.Transitional VersionTo paying usersProvide three monthsIt will be officially taken offline afterward. OpenAI officially states that GPT-5.2 belongs toPart of its Continuous Improvement Model SeriesSubsequent iterations will continue to focus on optimizing known issues such as excessive rejection and response delays.

On the API side, GPT-5.2 Thinking corresponds to gpt-5.2, Instant corresponds to gpt-5.2-chat-latest, and Pro corresponds to gpt-5.2-pro. Developers can call these directly.

OpenAI十周年推出其迄今最强模型GPT-5.2，奥特曼：十年后将构建出超级智能

▲Image source: OpenAI official blog

In terms of price,The cost of invoking GPT-5.2 has increased compared to the previous generation.Input: $1.75 per million tokens (approximately RMB 12.35 per million tokens); Output: $14 per million tokens (approximately RMB 98.81 per million tokens). GPT-5.2 Pro pricing: $21 and $168 per million tokens (approximately RMB 148 and RMB 1,185 per million tokens).and for the first time supports the fifth level of inference intensity: xhigh.

▲Image source: OpenAI official blog

OpenAI co-founder and CEO Sam Altman announced on social platform X the performance of GPT-5.2 across multiple cutting-edge benchmarks: SWE-Bench Pro achieved55.6%ARC-AGI-2 is52.9%Frontier Math is40.3%.

▲Image source: X platform

These benchmarks are primarily used to measure the model's performance inComplex Code Fixing, General Reasoning, and Challenging Mathematical TasksIn its performance, GPT-5.2 has further enhanced its stability on high-level tasks.

According to OpenAI's official blog, GPT-5.2 covers44 occupationsIn clearly defined knowledge-based tasks, both models outperformed industry professionals. Compared to GPT-5.1 Thinking, GPT-5.2 Thinking demonstrated superior performance in addressingKnowledge-based tasks, programming, scientific problems, mathematics, abstract reasoningMultiple capabilities have been significantly enhanced, particularlyAchieved a perfect score on the AIME 2025, a top-tier mathematics competition.In the OpenAI Professional Work Benchmark GDPval, it outperformed or matched human experts who achieved 70.91 TP4T.

▲Image source: OpenAI official blog

OpenAI team member Yann Dubois also posted on the social platform X, stating that the design of GPT-5.2 Thinking focuses on“tasks with higher economic value”(such as coding, spreadsheets, and presentation documents).

▲Image source: X platform

Additionally, in eight benchmarks including SWE-Bench Pro and GPQA Diamond, GPT-5.2 Thinking achieved scoresBoth surpass Google Gemini 3 Pro and Anthropic Claude Opus 4.5.

▲Image source: OpenAI

It is worth mentioning that,GPT-5.2 demonstrates significantly enhanced capabilities in handling multimodal tasks.Cursor, the top AI programming assistant, is poised to catch up with Gemini, announcing the immediate rollout of GPT-5.2.

Meanwhile, Microsoft Chairman and CEO Satya Nadella announced,GPT-5.2 will be fully integrated into Microsoft 365 Copilot, GitHub Copilot, and Foundry..

▲Image source: X platform

At the GPT-5.2 launch event, Fidji Simo, Head of Applications at OpenAI, also confirmed the long-awaitedChatGPT “Adult Mode”The feature is expected to launch in the first quarter of 2026. Fidji Simo stated that before rolling out this model, OpenAI aims to ensure the age prediction model is sufficiently mature to accurately identify underage users while avoiding misclassifying adults.

Currently, this age prediction model has undergone early testing in several countries, primarily for automatically applying different content restrictions and security policies.

I. Professional Task Competency Surge, Achieving “Expert-Level” Scoring for the First Time

According to official disclosures from OpenAI, GPT-5.2 Thinking achieved “expert-level” performance for the first time in the GDPval evaluation covering 44 occupational tasks—surpassing or matching industry professionals at 70.91 TP4T. GPT-5.2 Pro further improved to 74.11 TP4T.When considering only tasks where it achieved a “clear win,” GPT-5.2 Thinking scored 49.81 TP4T, while GPT-5.2 Pro reached 601 TP4T.

This evaluation covers multiple real-world business outputs including sales presentations, budget models, operational schedules, and manufacturing flowcharts. GPT-5.2 demonstrated strong performance across these tasks.Generation speed is approximately 11 times faster than that of human experts.The cost is 11% lower than that of 4TB or below.

In investment research tasks, GPT-5.2 Thinking demonstrated strong performance in internal evaluations across scenarios such as investment banking financial statement models and leveraged buyout models.The average score was 68.41 TP4T.Compared to GPT-5.1 Thinking's 59.11 TP4T, there is a clear improvement, with GPT-5.2 Pro's score further increasing to 71.71 TP4T.

▲Image source: OpenAI official blog

▲Comparison of GPT-5.1 Thinking and GPT-5.2 Thinking Performance

II. Comprehensive Upgrades to Code, Tool Invocation, and Long-Running Tasks

In terms of coding abilityGPT-5.2 Thinking achieved 55.61 TP4T on the more rigorous SWE-bench Pro (spanning four languages and emphasizing real-world engineering complexity) and reached 801 TP4T on SWE-bench Verified, both significantly outperforming GPT-5.1's 50.81 TP4T and 76.31 TP4T.On the SWE-Lancer IC Diamond task, GPT-5.2 Thinking achieved 74.61 TP4T (compared to GPT-5.1's 69.71 TP4T).

▲Image source: OpenAI official blog

At the same timeGPT-5.2 has appeared on the leaderboard of the AI benchmark platform Imarena.ai (Arena).and scored 1486 points in the WebDev test,Ranked second, just three points behind the leaderIt outperformed mainstream models such as Claude-opus-4-5 and Gemini-3-pro. Another version, GPT-5.2, ranked sixth with a score of 1399.

According to Arena documentation, GPT-5.2 was previously tested internally under the codenames “robin” and “robin-high.” Its scores differ by only one point from GPT-5-medium. These results remain preliminary and are expected to stabilize further as more testing data accumulates.

From an evaluation perspective, Arena primarily measures a model's end-to-end coding capabilities in deployable web application scenarios. GPT-5.2 has demonstrated its practicality in handling complex task sequences.

In terms of factual accuracyGPT-5.2 Thinking achieves an error-free response rate of 93.91% on ChatGPT-based queries (with search mode enabled), representing an improvement over GPT-5.1's 91.21%. Without search mode, the rate also increases from 87.31% to 88.1%.

▲Image source: OpenAI official blog

Another key change comes fromTool Invocation and Reliability Enhancement for Long-Running Tasks.

GPT-5.2 Thinking in the Tau-2 Bench TelecomAchieved a maximum score of 98.71 TP4TIn zero-inference mode, it also significantly outperforms the previous generation, with accuracy improving from 77.91 TP4T to 82.1 TP4T in the higher-noise Retail scenario.In the more general-purpose toolchain evaluation BrowseComp, GPT-5.2 Thinking achieved 65.81 TP4T, while the Pro version reached 77.91 TP4T—both surpassing GPT-5.1's 50.81 TP4T.

▲Image source: OpenAI official blog

OpenAI mentioned that,Both GPT-5.2 Thinking and Pro support the fifth-tier reasoning intensity level: xhigh.Suitable for professional task scenarios involving long processes, multiple steps, and high precision.

III. GPT-5.2 Delivers Comprehensive Enhancements in Long-Term Context and Visual Comprehension

In terms of long-term contextual capabilitiesGPT-5.2 Thinking in OpenAI MRCRv2Comprehensively superior to the previous generationIn the 8-needle test, it maintained performance across the range from 4k to 256k.Significantly outperforms GPT-5.1At lengths of 4k–8k, it achieves 98.21 TP4T, and maintains 77.01 TP4T at lengths of 128k–256k, while GPT-5.1 ranges from 29.61 TP4T to 47.81 TP4T during the same period.

In other long-text scenarios, within BrowseComp Long Context (128k/256k), GPT-5.2 Thinking respectivelyReached 92.01 TP4T and 89.81 TP4T.In the GraphWalks task, GPT-5.2 Thinking achieved 94.01 TP4T on the bfs subset and 89.01 TP4T on the parents subset, representing significant improvements over GPT-5.1's 76.81 TP4T and 71.51 TP4T.

▲Image source: OpenAI official blog

In terms of visual comprehensionGPT-5.2 Thinking achieved 82.11 TP4T on the CharXiv scientific diagram reasoning task in tool-free mode, further improving to 88.71 TP4T when Python tools were enabled. In ScreenSpot-Pro interface comprehension, GPT-5.2 Thinking scored 86.31 TP4T, significantly outperforming GPT-5.1's 64.21 TP4T.In the more challenging Video MMMU task, which combines video and multimodal elements, performance also improved from 82.91 TP4T to 85.91 TP4T.

In terms of visual abilityGPT-5.2 in ScreenSpot-Pro (Interface Understanding)Achieved an accuracy rate of 86.31%Compared to GPT-5.1, it shows significant improvements. On the CharXiv scientific chart reasoning task, it also achieves substantial gains in accuracy. This makes it more reliable when processing specialized visual inputs such as scientific charts, operational dashboards, and product interface screenshots.

▲Image source: OpenAI official blog

4. Microsoft's Suite of Products Upgrades in Tandem, with GPT-5.2 Emerging as the Next-Generation “Productivity Model”

With the release of GPT-5.2, Microsoft Chairman and CEO Satya Nadella also announced on the social platform X,GPT-5.2 will be fully integrated into Microsoft 365 Copilot, GitHub Copilot, and Foundry.and serve as the new “default inference model” for more workflow scenarios.

In Microsoft 365 CopilotUsers can now enable GPT-5.2 through the model selector for high-complexity tasks such as meeting note analysis, document reasoning, market research, and strategic planning. Nadella stated that by integrating the model with users' work data, GPT-5.2 can better leverage its reasoning capabilities.

On GitHub CopilotGPT-5.2 excels at long-context reasoning and complex codebase review, focusing on engineering use cases such as cross-file relationship analysis, dependency tracing, and refactoring suggestions.

In addition.GPT-5.2 has also been integrated into Microsoft Foundry and Copilot Studio.Developers can directly invoke the GPT-5.2 model when building automated workflows, deploying internal agents, or conducting independent development.Consumer-facing CopilotA phased update will also be initiated shortly to gradually replace the current version.

▲Image source: X platform

Within the Microsoft ecosystem, GPT-5.2 has been positioned as the “default productivity model,” serving a broader range of development, writing, and analysis tasks across different product lines through automated model selection.

Additionally, Cursor, the top-tier AI programming assistant, has swiftly integrated GPT-5.2 and continues to use OpenAI's official API pricing.

▲Image source: Cursor

Conclusion: The capability boundaries of GPT-5.2 are converging toward “stability and practicality.”

From multiple public benchmarks to Arena's evaluation of end-to-end capabilities for web applications, GPT-5.2's overall performance converges toward stable reliability and task completion.

With the introduction of the multi-tier capability system comprising Instant, Thinking, and Pro tiers, GPT-5.2 has been segmented into clearer use cases across different workflows. Its comprehensive integration within the Microsoft ecosystem further reinforces this directional shift.Whether performing cross-document reasoning in M365 Copilot or handling long-context code chains in GitHub Copilot, GPT-5.2 is increasingly integrated into higher-frequency, more specific task workflows.

In addition to launching cutting-edge models for professional tasks and AI agents, OpenAI announced it has reached a licensing agreement with Disney, allowing Sora 2 users to incorporate Disney characters into images they generate and share. Disney will invest $1 billion (approximately RMB 7.1 billion) in OpenAI and retain the option to increase its stake in the future.

artifact # OpenAI

The copyright of the article belongs to the author, please do not reprint without permission.

DeepSeek + Dify: A one-article guide to locally deploying an enterprise-grade knowledge base app

artifact # DeepSeek # Dify

谷歌推出超小型AI模型Gemma 3 270M！手机能跑，智能设备离线运行新突破

Google launches ultra-small AI model Gemma 3 270M! Cell phones can run it, a new breakthrough for smart devices running offline!

artifact # Gemma 3

OpenAI marks its 10th anniversary by unveiling its most powerful model to date, GPT-5.2. Sam Altman states: "We will build superintelligence within a decade."

I. Professional Task Competency Surge, Achieving “Expert-Level” Scoring for the First Time

II. Comprehensive Upgrades to Code, Tool Invocation, and Long-Running Tasks

III. GPT-5.2 Delivers Comprehensive Enhancements in Long-Term Context and Visual Comprehension

4. Microsoft's Suite of Products Upgrades in Tandem, with GPT-5.2 Emerging as the Next-Generation “Productivity Model”

Conclusion: The capability boundaries of GPT-5.2 are converging toward “stability and practicality.”

With a wild 2 million in 6 days, why is this AI app exploding all over the internet?

Gemini 3 Flash Makes Its Grand Debut: Blazing Speed and Intelligence Surpassing Pro, Ushering in a New Chapter for AI

Related posts

DeepSeek + Dify: A one-article guide to locally deploying an enterprise-grade knowledge base app

Google launches ultra-small AI model Gemma 3 270M! Cell phones can run it, a new breakthrough for smart devices running offline!

With a wild 2 million in 6 days, why is this AI app exploding all over the internet?

Cursor version of OpenClaw debuts! AI reviews code and fixes bugs on its own, programmers' lobster freedom is here?

No comments

Popular Articles

Popular Sites