Ali Releases Qwen3-Max-Thinking Reasoning Model with International Leadership in Several Performance Areas
Did not wait for GPT-5.3, Gemini 3.5, this week's big model release tide first by Ali “jumped the gun”!
Last night, Ali launchedQwen3-Max-ThinkingThis isAli QwenThe most capable flagship inference model in the series to date, Qwen3-Max-Thinking fought back and forth with top models such as GPT-5.2-Thinking, Claude-Opus-4.5, and Gemini 3 Pro in 19 authoritative benchmarks, paired with the ability to expand when tested (TTS).Can reach SOTA on quite a few benchmarks.
Qwen3-Max-Thinking Benchmark Results
What's new about Qwen3-Max-Thinking? First of all, it hasAdaptive tool invocation capabilitiesThe model can be used to call search engines and code interpreters on demand, saving the user the trouble of manually selecting tools. Perhaps out of confidence in the modeling tool's ability to invoke theThousand Questions directly removed the search logo from the dialog box.
The model also incorporates Ali's own test-time scaling ideas. Unlike the industry's common practice of “stacking parallel reasoning paths”, Qwen3-Max-Thinking doesn't just add more parallel branches.Focusing limited computational resources on the “smarter” reasoning process itselfThe model is more accurate, more economical, and more “reflective” in its reasoning.
In fact, as early as September last year, Ali has been on-line Qwen3-Max Preview version, compared with the Preview version, the official version of the realization of the effective integration of thinking and non-thinking modes.Qwen3-Max context window is 256k, the number of references has not been announced, but it should be similar to the Preview version.That's over 1 trillion parameters.
Qwen3-Max-Thinking is not an open source model. It is now live on Qwen Chat, where you can experience the model's adaptive tool calling capabilities. At the same time, Qwen3-Max-Thinking's API is open.Priced at $2.5/million input tokens, $10/million output tokens, still more cost effective.
Qwen3-Max-Thinking API Call Interface
It is worth mentioning that Ali also open-sourced the full range of Qwen3-TTS speech synthesis models on the same day, supporting timbre cloning, timbre creation, anthropomorphizationspeech production, and voice control based on natural language descriptions.
Experience Links:https://chat.qwen.AI/
API calling platform:https://bailian.console.aliyun.com/cn-beijing/?tab=model#/model-market/detail/qwen3-max-2026-01-23
I. Tested adaptive search performance ability is better than ChatGPT, search and code interpreter can be used in combination
Qwen3-Max-Thinking was the first thing that Wizards experienced when it went live.
We first look at Qwen3-Max-Thinking's adaptive tool-calling capability. This is a capability that was developed through a specialized training process: after completing the initial fine-tuning of tool use, the model was further trained on diverse tasks using rule-based and model-based feedback.
The ability to adaptively perform searches has actually become relatively common. Both DeepSeek and ChatGPT can proactively search for queries that clearly involve instant information, as can Qwen3-Max-Thinking, which, for example, proactively searches to give an accurate answer when asked about today's weather.
Qwen3-Max-Thinking can also call up searches on its own for content that is not obviously current. For example, if we ask it “what is Clawdbot”, the model thinks for a while and realizes that it has no relevant knowledge, and then starts searching for it and gives us a complete description.
This is something that the model in ChatGPT doesn't do very well; it assumes that what's not in its knowledge base is wrong, and doesn't search and check.
For example, when we asked Qwen3-Max-Thinking to “simulate flipping an even coin 1000 times, counting the number of times heads up and verifying the law of large numbers”, it turned on its code interpreter and wrote over 60 lines of Python to accomplish my task. The content of the icons it generated in Python was correct, just plainly drawn.
Immediately after that, we tried to make Qwen3-Max-Thinking combine two major tools, search and code interpreter, to accomplish the task.
In the following task, Qwen3-Max-Thinking needs to look up the stock price movements of NVIDIA and AMD since 2026 and then generate a chart. Examining the thought process and code, it can be seen that although Qwen3-Max-Thinking did a search, it was a bit of a “one-hit wonder”, looking at a number of different sources and failing to find stock prices for all dates.
Nevertheless, in the end, the icons generated by Qwen3-Max-Thinking satisfy the basic need of observing stock price trends, and their analysis results are relatively comprehensive by combining information such as market analysis and earnings reports.
Second, using an efficient new type of reasoning, the programming aesthetic is better than the preview version
For reasoning, Ali employs an empirically cumulative, multi-round iterative test-time scaling strategy for Qwen3-Max-Thinking.
Unlike simply increasing the number of parallel reasoning paths, which often leads to redundant reasoning, Qwen3-Max-Thinking limits the number of paths and uses the computational resources saved for iterative self-reflection guided by an “experience extraction” mechanism.
This mechanism distills key information from past inference rounds, allowing the model to avoid repeatedly deriving known conclusions and focus on unresolved uncertainties. Compared to directly referencing the original inference trajectory, this mechanism achieves greater efficiency in context utilization, allowing fuller integration of historical information within the same context window.
The method consistently outperforms standard parallel sampling and aggregation methods for roughly the same token consumption, which allows the model to achieve performance gains of 2-4 points in a variety of benchmarks requiring inference power, such as GPQA, HLE, and LiveCodeBench v6.
We tried to get Qwen3-Max-Thinking to do a strength and speed population simulator, which was the same exam we had previously tested Qwen3-Max-Preview on.
Prompt word: There are two populations, population a focuses on the development of strength and population b focuses on the development of speed, model the interaction between the two populations and give a description.
It can be noticed that after sending the same prompt words, Qwen3-Max-Thinking prefers to use the code interpreter to draw diagrams to solve the problem instead of generating a web page as in the Preview version.
After we explicitly asked to generate a web page to simulate it, Qwen3-Max-Thinking delivered the following results, with a richer one-off generation and an improved UI aesthetic compared to Qwen3-Max-Preview. This may be due to the fact that it has already explored the topic more fully in context, though.
Qwen3-Max-Thinking generates results:
Qwen3-Max-Preview generates results:
On the X platform, some users have also already tried out Qwen3-Max-Thinking's reasoning abilities. However, it should be noted that Qwen3-Max has now hidden the full thought chain path and instead provides a thought chain summary, which some users have found unacceptable.
AI blogger Max for AI shared that Qwen3-Max-Thinking was able to accurately analyze the download trends of two major open-source model families with its reasoning ability to bypass logical traps set by users, without making up non-existent data.
Qwen3-Max-Thinking bypasses logic traps (Source: Max for AI@X)
Conclusion: China's Big Model Continues to Explore Efficient Reasoning Paths
In a public speech in January this year, Ali Qianqi big model head Lin Jun Yang revealed that in the country, AI research is still a big constraint is the arithmetic, Ali's big model delivery work has taken up a large portion of the arithmetic, leaving the arithmetic for scientific research in fact is not as rich as imagined.
Yang Jun Lin's formulation happens to be in the same direction as the upgrade of Qwen3-Max-Thinking, which, through various technical and engineering optimizations, is able to deliver results with higher token efficiency, somehow reducing the need for computing power.
In the future, this model of “prioritizing efficiency and meticulous planning” may continue to serve as a main line of sustainable innovation for China's large models under resource constraints.
© Copyright notes
The copyright of the article belongs to the author, please do not reprint without permission.
Related posts
No comments...