
What is Qwen3-Next?
Qwen3-Next is the next-generation base model architecture released by AliCloud's Tongyi team on September 12, 2025, aiming to achieve extreme contextual processing power and parameter efficiency through architectural innovation. Its core model, Qwen3-Next-80B-A3B, has 80 billion total parameters, but only 3 billion parameters are activated during inference (activation ratio 1:50), which significantly reduces computational costs while maintaining high performance. The model supports millions of tokens of ultra-long contexts, reduces the training cost by more than 90% compared with the previous generation of dense model Qwen3-32B, and improves the throughput of long text inference by more than 10 times, which is comparable to the flagship version of the Qwen3 model with 235 billion parameters.
Qwen3-Next's core technology
- High sparsity MoE architecture
- Dual-track expert designThe model contains 512 expert modules, with 10 sparse experts + 1 shared expert dynamically selected for each inference. Shared experts provide a stable computational base, while sparse experts handle specialized tasks, realizing "general practitioner + specialist" collaboration.
- extreme sparsity: Activation parameter ratio up to1:50The company's computational efficiency has been enhanced by the fact that it is well above the industry average (e.g., 1:10 for Qwen3).90%Above.
- Hybrid Attention mechanism (Hybrid Attention)
- Gated DeltaNet (linear attention): by O(N) complexity Modeling long distance dependencies (e.g., entire book veins) with reduced memory consumption 50%.
- Gated Attention: Efficiently capture localized information (e.g., phrases, keywords) and mix the two in a 3:1 ratio to balance performance and efficiency.
- Multi Token Prediction (MTP)
- The pre-training phase predicts multiple future Tokens (e.g., t+1, t+2, ..., t+n) at the same time to improve the model's understanding of causal relationships.
- Adapt Speculative Decoding in the inference phase to generate multiple candidate Tokens at once and validate them in parallel for faster decoding. several times (bigger).
- Training stability optimization
- Zero-Centered RMSNorm: Impose constraints on normalization layer weights to avoid gradient explosion or vanishing and improve training stability.
- MoE route initialization optimization: Ensure that expert modules are selected unbiased early in training to reduce initialization perturbations.
Scenarios for Qwen3-Next
- Long Text Processing
- Analysis of legal instruments: Support for multi-million Tokens contexts for complete parsing of long documents such as contracts and judgments.
- Review of scientific literature: Efficiently process long papers and lab reports, extract key information and generate summaries.
- Efficient Reasoning
- real time interactive application: The low activation parameter design enables it to excel in domestic arithmetic and is suitable for intelligent customer service, online education and other scenarios.
- Low latency generation: MTP technology accelerates the decoding process and improves conversation smoothness.
- complex reasoning task
- Math and Programming: Score on the AIME25 Math Reasoning Assessment87.8, approaching SOTA levels; outperforming the flagship Thousand Questions open source model in the LiveCodeBench programming review.
- Multi-step logic chain construction: Reasoning models (Thinking versions) excel at solving problems that require step-by-step reasoning, such as logic puzzles and strategic planning.
Qwen3-Next project address
- Official Web Version::chat.qwen.ai
- HuggingFace::huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
- Kaggle::kaggle.com/models/qwen-lm/qwen3-next-80b
Recommended Reasons
- Ultimate price/performance ratio
- Lower training costs90%above, reasoning throughput is increased10 timesIt significantly lowers the threshold for enterprise AI adoption.
- technological leadership
- Innovative technologies such as Hybrid Attention Mechanism, High Sparsity MoE, and MTP represent the cutting edge of the industry and set a new standard for long context processing.
- Open Source Ecological Advantage
- The number of models derived from Tongyi's thousand questions exceeds170,000The company is the world's No. 1, and developers can quickly customize applications based on open source code.
- Strong scenario adaptability
- It supports diverse scenarios from long text analysis to real-time interaction, covering a wide range of industries such as law, scientific research, education, and customer service.
data statistics
Relevant Navigation

Google introduced a new generation of AI models that support multimodal inputs and outputs and natively integrate intelligent tools to provide developers with powerful and flexible assistant functions.

Yan model
Rockchip has developed the first non-Transformer architecture generalized natural language model with high performance, low cost, multimodal processing capability and private deployment security.

Blue Heart Large Model
Vivo's self-developed generalized big model matrix contains several self-developed big models covering core scenarios, providing intelligent assistance, dialog bots, and other functions with powerful language understanding and generation capabilities.

Speech Rhinoceros Big Model
Based on industrial data and technology, Jingdong has developed an intelligent large model with extensive industry application capabilities, and is committed to providing efficient and intelligent solutions for enterprises.

Guangyu LM
An innovative big model that combines big language and symbolic reasoning, designed to enhance the credibility and accuracy of applications in finance, healthcare, and other fields.

Qwen2.5-Max
The mega-scale Mixture of Experts model introduced by AliCloud's Tongyi Thousand Questions team stands out in the AI field for its excellent performance and wide range of application scenarios.

BaiChuan LM
Baichuan Intelligence launched a large-scale language model integrating intent understanding, information retrieval and reinforcement learning technologies, which is committed to providing natural and efficient intelligent services, and has opened APIs and open-sourced some of the models.

Feishu Ask
Feishu launched an AI conversational search and quiz tool designed to help users quickly integrate and retrieve knowledge resources within Feishu.
No comments...