
What is Qwen3-Next?
Qwen3-Next is the next-generation base model architecture released by AliCloud's Tongyi team on September 12, 2025, aiming to achieve extreme contextual processing power and parameter efficiency through architectural innovation. Its core model, Qwen3-Next-80B-A3B, has 80 billion total parameters, but only 3 billion parameters are activated during inference (activation ratio 1:50), which significantly reduces computational costs while maintaining high performance. The model supports millions of tokens of ultra-long contexts, reduces the training cost by more than 90% compared with the previous generation of dense model Qwen3-32B, and improves the throughput of long text inference by more than 10 times, which is comparable to the flagship version of the Qwen3 model with 235 billion parameters.
Qwen3-Next's core technology
- High sparsity MoE architecture
- Dual-track expert designThe model contains 512 expert modules, with 10 sparse experts + 1 shared expert dynamically selected for each inference. Shared experts provide a stable computational base, while sparse experts handle specialized tasks, realizing "general practitioner + specialist" collaboration.
- extreme sparsity: Activation parameter ratio up to1:50The company's computational efficiency has been enhanced by the fact that it is well above the industry average (e.g., 1:10 for Qwen3).90%Above.
- Hybrid Attention mechanism (Hybrid Attention)
- Gated DeltaNet (linear attention): by O(N) complexity Modeling long distance dependencies (e.g., entire book veins) with reduced memory consumption 50%.
- Gated Attention: Efficiently capture localized information (e.g., phrases, keywords) and mix the two in a 3:1 ratio to balance performance and efficiency.
- Multi Token Prediction (MTP)
- The pre-training phase predicts multiple future Tokens (e.g., t+1, t+2, ..., t+n) at the same time to improve the model's understanding of causal relationships.
- Adapt Speculative Decoding in the inference phase to generate multiple candidate Tokens at once and validate them in parallel for faster decoding. several times (bigger).
- Training stability optimization
- Zero-Centered RMSNorm: Impose constraints on normalization layer weights to avoid gradient explosion or vanishing and improve training stability.
- MoE route initialization optimization: Ensure that expert modules are selected unbiased early in training to reduce initialization perturbations.
Scenarios for Qwen3-Next
- Long Text Processing
- Analysis of legal instruments: Support for multi-million Tokens contexts for complete parsing of long documents such as contracts and judgments.
- Review of scientific literature: Efficiently process long papers and lab reports, extract key information and generate summaries.
- Efficient Reasoning
- real time interactive application: The low activation parameter design enables it to excel in domestic arithmetic and is suitable for intelligent customer service, online education and other scenarios.
- Low latency generation: MTP technology accelerates the decoding process and improves conversation smoothness.
- complex reasoning task
- Math and Programming: Score on the AIME25 Math Reasoning Assessment87.8, approaching SOTA levels; outperforming the flagship Thousand Questions open source model in the LiveCodeBench programming review.
- Multi-step logic chain construction: Reasoning models (Thinking versions) excel at solving problems that require step-by-step reasoning, such as logic puzzles and strategic planning.
Qwen3-Next project address
- Official Web Version::chat.qwen.ai
- HuggingFace::huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
- Kaggle::kaggle.com/models/qwen-lm/qwen3-next-80b
Recommended Reasons
- Ultimate price/performance ratio
- Lower training costs90%above, reasoning throughput is increased10 timesIt significantly lowers the threshold for enterprise AI adoption.
- technological leadership
- Innovative technologies such as Hybrid Attention Mechanism, High Sparsity MoE, and MTP represent the cutting edge of the industry and set a new standard for long context processing.
- Open Source Ecological Advantage
- The number of models derived from Tongyi's thousand questions exceeds170,000The company is the world's No. 1, and developers can quickly customize applications based on open source code.
- Strong scenario adaptability
- It supports diverse scenarios from long text analysis to real-time interaction, covering a wide range of industries such as law, scientific research, education, and customer service.
data statistics
Related Navigation

The multimodal large model independently developed by CloudScience has the ability of real-time learning, synchronous feedback, cross-modal interaction, etc. It is widely used in many industries such as finance, security, government affairs, etc., to promote the popularization and development of AI applications.

Outlier AI
A platform that connects experts with AI model development to optimize the quality and reliability of generative AI through human expertise.

Qwen3-Max-Preview
Alibaba's flagship large model with trillions of parameters, supporting ultra-long context, multi-language understanding and powerful inference programming capabilities, is built for complex tasks and enterprise-class applications.

Hunyuan T1
Tencent's self-developed deep thinking models with fast response, ultra-long text processing and strong reasoning capabilities have been widely used in intelligent Q&A, document processing and other fields.

Doubao
ByteDance launched a self-developed big model. Through byte jumping internal 50 + business scene practice verification, daily 100 billion tokens large use of continuous polishing, to provide multi-modal capabilities, with high quality model effect for the enterprise to create a rich business experience

Kling LM
Racer's self-developed advanced video generation model supports the generation of high-quality videos based on text descriptions, helping users to efficiently create artistic video content.

360Brain
360 company independently developed a comprehensive large model, integrated with multimodal technology, with powerful generation creation, logical reasoning and other capabilities, to provide enterprises with a full range of AI services.

Feishu Ask
Feishu launched an AI conversational search and quiz tool designed to help users quickly integrate and retrieve knowledge resources within Feishu.
No comments...
