
What is Qwen3-Next?
Qwen3-Next is the next-generation base model architecture released by AliCloud's Tongyi team on September 12, 2025, aiming to achieve extreme contextual processing power and parameter efficiency through architectural innovation. Its core model, Qwen3-Next-80B-A3B, has 80 billion total parameters, but only 3 billion parameters are activated during inference (activation ratio 1:50), which significantly reduces computational costs while maintaining high performance. The model supports millions of tokens of ultra-long contexts, reduces the training cost by more than 90% compared with the previous generation of dense model Qwen3-32B, and improves the throughput of long text inference by more than 10 times, which is comparable to the flagship version of the Qwen3 model with 235 billion parameters.
Qwen3-Next's core technology
- High sparsity MoE architecture
- Dual-track expert designThe model contains 512 expert modules, with 10 sparse experts + 1 shared expert dynamically selected for each inference. Shared experts provide a stable computational base, while sparse experts handle specialized tasks, realizing "general practitioner + specialist" collaboration.
- extreme sparsity: Activation parameter ratio up to1:50The company's computational efficiency has been enhanced by the fact that it is well above the industry average (e.g., 1:10 for Qwen3).90%Above.
- Hybrid Attention mechanism (Hybrid Attention)
- Gated DeltaNet (linear attention): by O(N) complexity Modeling long distance dependencies (e.g., entire book veins) with reduced memory consumption 50%.
- Gated Attention: Efficiently capture localized information (e.g., phrases, keywords) and mix the two in a 3:1 ratio to balance performance and efficiency.
- Multi Token Prediction (MTP)
- The pre-training phase predicts multiple future Tokens (e.g., t+1, t+2, ..., t+n) at the same time to improve the model's understanding of causal relationships.
- Adapt Speculative Decoding in the inference phase to generate multiple candidate Tokens at once and validate them in parallel for faster decoding. several times (bigger).
- Training stability optimization
- Zero-Centered RMSNorm: Impose constraints on normalization layer weights to avoid gradient explosion or vanishing and improve training stability.
- MoE route initialization optimization: Ensure that expert modules are selected unbiased early in training to reduce initialization perturbations.
Scenarios for Qwen3-Next
- Long Text Processing
- Analysis of legal instruments: Support for multi-million Tokens contexts for complete parsing of long documents such as contracts and judgments.
- Review of scientific literature: Efficiently process long papers and lab reports, extract key information and generate summaries.
- Efficient Reasoning
- real time interactive application: The low activation parameter design enables it to excel in domestic arithmetic and is suitable for intelligent customer service, online education and other scenarios.
- Low latency generation: MTP technology accelerates the decoding process and improves conversation smoothness.
- complex reasoning task
- Math and Programming: Score on the AIME25 Math Reasoning Assessment87.8, approaching SOTA levels; outperforming the flagship Thousand Questions open source model in the LiveCodeBench programming review.
- Multi-step logic chain construction: Reasoning models (Thinking versions) excel at solving problems that require step-by-step reasoning, such as logic puzzles and strategic planning.
Qwen3-Next project address
- Official Web Version::chat.qwen.ai
- HuggingFace::huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
- Kaggle::kaggle.com/models/qwen-lm/qwen3-next-80b
Recommended Reasons
- Ultimate price/performance ratio
- Lower training costs90%above, reasoning throughput is increased10 timesIt significantly lowers the threshold for enterprise AI adoption.
- technological leadership
- Innovative technologies such as Hybrid Attention Mechanism, High Sparsity MoE, and MTP represent the cutting edge of the industry and set a new standard for long context processing.
- Open Source Ecological Advantage
- The number of models derived from Tongyi's thousand questions exceeds170,000The company is the world's No. 1, and developers can quickly customize applications based on open source code.
- Strong scenario adaptability
- It supports diverse scenarios from long text analysis to real-time interaction, covering a wide range of industries such as law, scientific research, education, and customer service.
data statistics
Related Navigation

Google introduces advanced AI models with powerful reasoning capabilities, multimodal support, and ultra-long context windows for multiple scenarios such as academic research, software development, creative work, and enterprise applications.

Xiaomi MiMo
Xiaomi's open-sourced 7 billion parameter inference macromodel, which outperforms models such as OpenAI o1-mini in mathematical reasoning and code competitions by a small margin.

Ovis2
Alibaba's open source multimodal large language model with powerful visual understanding, OCR, video processing and reasoning capabilities, supporting multiple scale versions.

Speech Rhinoceros Big Model
Based on industrial data and technology, Jingdong has developed an intelligent large model with extensive industry application capabilities, and is committed to providing efficient and intelligent solutions for enterprises.

Claude 3.7 Max
Anthropic's top-of-the-line AI models for hardcore developers tackle ultra-complex tasks with powerful code processing and a 200k context window.

Bunshin Big Model 4.5
Baidu's self-developed native multimodal basic big model, with excellent multimodal understanding, text generation and logical reasoning capabilities, using a number of advanced technologies, the cost is only 1% of GPT4.5, and plans to be fully open source.

DeepSeek-VL2
Developed by the DeepSeek team, it is an efficient visual language model based on a hybrid expert architecture with powerful multimodal understanding and processing capabilities.

Evo 2
The world's largest biology AI model, jointly developed by multiple top organizations, is trained based on massive genetic data and can accurately predict genetic variants and generated sequences to help breakthroughs in life sciences.
No comments...
