Ali Releases Qwen3-Max-Thinking Reasoning Model with International Leadership in Several Performance Areas

Did not wait for GPT-5.3, Gemini 3.5, this week's big model release tide first by Ali “jumped the gun”!

Last night, Ali launchedQwen3-Max-ThinkingThis isAli QwenThe most capable flagship inference model in the series to date, Qwen3-Max-Thinking fought back and forth with top models such as GPT-5.2-Thinking, Claude-Opus-4.5, and Gemini 3 Pro in 19 authoritative benchmarks, paired with the ability to expand when tested (TTS).Can reach SOTA on quite a few benchmarks.

Qwen3-Max-Thinking Benchmark Results

What's new about Qwen3-Max-Thinking? First of all, it hasAdaptive tool invocation capabilitiesThe model can be used to call search engines and code interpreters on demand, saving the user the trouble of manually selecting tools. Perhaps out of confidence in the modeling tool's ability to invoke theThousand Questions directly removed the search logo from the dialog box.

The model also incorporates Ali's own test-time scaling ideas. Unlike the industry's common practice of “stacking parallel reasoning paths”, Qwen3-Max-Thinking doesn't just add more parallel branches.Focusing limited computational resources on the “smarter” reasoning process itselfThe model is more accurate, more economical, and more “reflective” in its reasoning.

In fact, as early as September last year, Ali has been on-line Qwen3-Max Preview version, compared with the Preview version, the official version of the realization of the effective integration of thinking and non-thinking modes.Qwen3-Max context window is 256k, the number of references has not been announced, but it should be similar to the Preview version.That's over 1 trillion parameters.

Qwen3-Max-Thinking is not an open source model. It is now live on Qwen Chat, where you can experience the model's adaptive tool calling capabilities. At the same time, Qwen3-Max-Thinking's API is open.Priced at $2.5/million input tokens, $10/million output tokens, still more cost effective.

Qwen3-Max-Thinking API Call Interface

It is worth mentioning that Ali also open-sourced the full range of Qwen3-TTS speech synthesis models on the same day, supporting timbre cloning, timbre creation, anthropomorphizationspeech production, and voice control based on natural language descriptions.

Experience Links:https://chat.qwen.AI/

API calling platform:https://bailian.console.aliyun.com/cn-beijing/?tab=model#/model-market/detail/qwen3-max-2026-01-23

I. Tested adaptive search performance ability is better than ChatGPT, search and code interpreter can be used in combination

Qwen3-Max-Thinking was the first thing that Wizards experienced when it went live.

We first look at Qwen3-Max-Thinking's adaptive tool-calling capability. This is a capability that was developed through a specialized training process: after completing the initial fine-tuning of tool use, the model was further trained on diverse tasks using rule-based and model-based feedback.

The ability to adaptively perform searches has actually become relatively common. Both DeepSeek and ChatGPT can proactively search for queries that clearly involve instant information, as can Qwen3-Max-Thinking, which, for example, proactively searches to give an accurate answer when asked about today's weather.

Qwen3-Max-Thinking can also call up searches on its own for content that is not obviously current. For example, if we ask it “what is Clawdbot”, the model thinks for a while and realizes that it has no relevant knowledge, and then starts searching for it and gives us a complete description.

This is something that the model in ChatGPT doesn't do very well; it assumes that what's not in its knowledge base is wrong, and doesn't search and check.

For example, when we asked Qwen3-Max-Thinking to “simulate flipping an even coin 1000 times, counting the number of times heads up and verifying the law of large numbers”, it turned on its code interpreter and wrote over 60 lines of Python to accomplish my task. The content of the icons it generated in Python was correct, just plainly drawn.

Immediately after that, we tried to make Qwen3-Max-Thinking combine two major tools, search and code interpreter, to accomplish the task.

In the following task, Qwen3-Max-Thinking needs to look up the stock price movements of NVIDIA and AMD since 2026 and then generate a chart. Examining the thought process and code, it can be seen that although Qwen3-Max-Thinking did a search, it was a bit of a “one-hit wonder”, looking at a number of different sources and failing to find stock prices for all dates.

Nevertheless, in the end, the icons generated by Qwen3-Max-Thinking satisfy the basic need of observing stock price trends, and their analysis results are relatively comprehensive by combining information such as market analysis and earnings reports.

Second, using an efficient new type of reasoning, the programming aesthetic is better than the preview version

For reasoning, Ali employs an empirically cumulative, multi-round iterative test-time scaling strategy for Qwen3-Max-Thinking.

Unlike simply increasing the number of parallel reasoning paths, which often leads to redundant reasoning, Qwen3-Max-Thinking limits the number of paths and uses the computational resources saved for iterative self-reflection guided by an “experience extraction” mechanism.

This mechanism distills key information from past inference rounds, allowing the model to avoid repeatedly deriving known conclusions and focus on unresolved uncertainties. Compared to directly referencing the original inference trajectory, this mechanism achieves greater efficiency in context utilization, allowing fuller integration of historical information within the same context window.

The method consistently outperforms standard parallel sampling and aggregation methods for roughly the same token consumption, which allows the model to achieve performance gains of 2-4 points in a variety of benchmarks requiring inference power, such as GPQA, HLE, and LiveCodeBench v6.

We tried to get Qwen3-Max-Thinking to do a strength and speed population simulator, which was the same exam we had previously tested Qwen3-Max-Preview on.

Prompt word: There are two populations, population a focuses on the development of strength and population b focuses on the development of speed, model the interaction between the two populations and give a description.

It can be noticed that after sending the same prompt words, Qwen3-Max-Thinking prefers to use the code interpreter to draw diagrams to solve the problem instead of generating a web page as in the Preview version.

After we explicitly asked to generate a web page to simulate it, Qwen3-Max-Thinking delivered the following results, with a richer one-off generation and an improved UI aesthetic compared to Qwen3-Max-Preview. This may be due to the fact that it has already explored the topic more fully in context, though.

Qwen3-Max-Thinking generates results:

Qwen3-Max-Preview generates results:

On the X platform, some users have also already tried out Qwen3-Max-Thinking's reasoning abilities. However, it should be noted that Qwen3-Max has now hidden the full thought chain path and instead provides a thought chain summary, which some users have found unacceptable.

AI blogger Max for AI shared that Qwen3-Max-Thinking was able to accurately analyze the download trends of two major open-source model families with its reasoning ability to bypass logical traps set by users, without making up non-existent data.

Qwen3-Max-Thinking bypasses logic traps (Source: Max for AI@X)

Conclusion: China's Big Model Continues to Explore Efficient Reasoning Paths

In a public speech in January this year, Ali Qianqi big model head Lin Jun Yang revealed that in the country, AI research is still a big constraint is the arithmetic, Ali's big model delivery work has taken up a large portion of the arithmetic, leaving the arithmetic for scientific research in fact is not as rich as imagined.

Yang Jun Lin's formulation happens to be in the same direction as the upgrade of Qwen3-Max-Thinking, which, through various technical and engineering optimizations, is able to deliver results with higher token efficiency, somehow reducing the need for computing power.

In the future, this model of “prioritizing efficiency and meticulous planning” may continue to serve as a main line of sustainable innovation for China's large models under resource constraints.

artifact # Ali Thousand Questions

The copyright of the article belongs to the author, please do not reprint without permission.

阿里开源首个图像生成基础模型Qwen-Image，支持中文高保真输出，登顶全球开源榜首

Ali open source the first image generation base model Qwen-Image, support for Chinese high-fidelity output, topped the global open source list

artifact # Qwen-Image

OpenAI十周年推出其迄今最强模型GPT-5.2，奥特曼：十年后将构建出超级智能

OpenAI marks its 10th anniversary by unveiling its most powerful model to date, GPT-5.2. Sam Altman states: "We will build superintelligence within a decade."

artifact # OpenAI

Midjourney Releases First Video Generation Model V1: Supports Up to 21 Seconds, $10 Per Month

artifact # Midjourney

Claude 4 airborne! AI programming true god debut, continuous 7 hours of autonomous programming, writing code efficiency pull full

artifact # Claude

No comments

No comments...

Ali Releases Qwen3-Max-Thinking Reasoning Model with International Leadership in Several Performance Areas

I. Tested adaptive search performance ability is better than ChatGPT, search and code interpreter can be used in combination

Second, using an efficient new type of reasoning, the programming aesthetic is better than the preview version

Conclusion: China's Big Model Continues to Explore Efficient Reasoning Paths

Wenxin 5.0 official version released, dominated the LMArena “strongest liberal arts students” in the end strong?

No more...

Related posts

Ali open source the first image generation base model Qwen-Image, support for Chinese high-fidelity output, topped the global open source list

OpenAI marks its 10th anniversary by unveiling its most powerful model to date, GPT-5.2. Sam Altman states: "We will build superintelligence within a decade."

Midjourney Releases First Video Generation Model V1: Supports Up to 21 Seconds, $10 Per Month

Claude 4 airborne! AI programming true god debut, continuous 7 hours of autonomous programming, writing code efficiency pull full

No comments

Popular Articles

Popular Sites