Qwen3-Next

4dys agorelease 93 0 0

Ali open source 80 billion parameters of the big model, 1:50 super sparse activation, millions of contexts, the cost down 90%, the performance is comparable to the hundreds of billions of models.

Language:

zh,en

Collection time:

2025-09-12

Open site Mobile view

Qwen3-Next

Open site

What is Qwen3-Next?

Qwen3-Next is the next-generation base model architecture released by AliCloud's Tongyi team on September 12, 2025, aiming to achieve extreme contextual processing power and parameter efficiency through architectural innovation. Its core model, Qwen3-Next-80B-A3B, has 80 billion total parameters, but only 3 billion parameters are activated during inference (activation ratio 1:50), which significantly reduces computational costs while maintaining high performance. The model supports millions of tokens of ultra-long contexts, reduces the training cost by more than 90% compared with the previous generation of dense model Qwen3-32B, and improves the throughput of long text inference by more than 10 times, which is comparable to the flagship version of the Qwen3 model with 235 billion parameters.

Qwen3-Next's core technology

High sparsity MoE architecture
- Dual-track expert designThe model contains 512 expert modules, with 10 sparse experts + 1 shared expert dynamically selected for each inference. Shared experts provide a stable computational base, while sparse experts handle specialized tasks, realizing "general practitioner + specialist" collaboration.
- extreme sparsity: Activation parameter ratio up to1:50The company's computational efficiency has been enhanced by the fact that it is well above the industry average (e.g., 1:10 for Qwen3).90%Above.
Hybrid Attention mechanism (Hybrid Attention)
- Gated DeltaNet (linear attention): by O(N) complexity Modeling long distance dependencies (e.g., entire book veins) with reduced memory consumption 50%.
- Gated Attention: Efficiently capture localized information (e.g., phrases, keywords) and mix the two in a 3:1 ratio to balance performance and efficiency.
Multi Token Prediction (MTP)
- The pre-training phase predicts multiple future Tokens (e.g., t+1, t+2, ..., t+n) at the same time to improve the model's understanding of causal relationships.
- Adapt Speculative Decoding in the inference phase to generate multiple candidate Tokens at once and validate them in parallel for faster decoding. several times (bigger).
Training stability optimization
- Zero-Centered RMSNorm: Impose constraints on normalization layer weights to avoid gradient explosion or vanishing and improve training stability.
- MoE route initialization optimization: Ensure that expert modules are selected unbiased early in training to reduce initialization perturbations.

Scenarios for Qwen3-Next

Long Text Processing
- Analysis of legal instruments: Support for multi-million Tokens contexts for complete parsing of long documents such as contracts and judgments.
- Review of scientific literature: Efficiently process long papers and lab reports, extract key information and generate summaries.
Efficient Reasoning
- real time interactive application: The low activation parameter design enables it to excel in domestic arithmetic and is suitable for intelligent customer service, online education and other scenarios.
- Low latency generation: MTP technology accelerates the decoding process and improves conversation smoothness.
complex reasoning task
- Math and Programming: Score on the AIME25 Math Reasoning Assessment87.8, approaching SOTA levels; outperforming the flagship Thousand Questions open source model in the LiveCodeBench programming review.
- Multi-step logic chain construction: Reasoning models (Thinking versions) excel at solving problems that require step-by-step reasoning, such as logic puzzles and strategic planning.

Qwen3-Next project address

Official Web Version::chat.qwen.ai
HuggingFace::huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
Kaggle::kaggle.com/models/qwen-lm/qwen3-next-80b

Recommended Reasons

Ultimate price/performance ratio
- Lower training costs90%above, reasoning throughput is increased10 timesIt significantly lowers the threshold for enterprise AI adoption.
technological leadership
- Innovative technologies such as Hybrid Attention Mechanism, High Sparsity MoE, and MTP represent the cutting edge of the industry and set a new standard for long context processing.
Open Source Ecological Advantage
- The number of models derived from Tongyi's thousand questions exceeds170,000The company is the world's No. 1, and developers can quickly customize applications based on open source code.
Strong scenario adaptability
- It supports diverse scenarios from long text analysis to real-time interaction, covering a wide range of industries such as law, scientific research, education, and customer service.

data statistics

Relevant Navigation

No comments

No comments...

Qwen3-Next

What is Qwen3-Next?

Qwen3-Next's core technology

Scenarios for Qwen3-Next

Qwen3-Next project address

Recommended Reasons

data statistics

Relevant Navigation

Gemini 2.0 Flash

Yan model

Blue Heart Large Model

Speech Rhinoceros Big Model

Guangyu LM

Qwen2.5-Max

BaiChuan LM

Feishu Ask

No comments

Latest Articles

Popular Sites

Qwen3-Next

What is Qwen3-Next?

Qwen3-Next's core technology

Scenarios for Qwen3-Next

Qwen3-Next project address

Recommended Reasons

data statistics

Relevant Navigation

Gemini 2.0 Flash

Yan model

Blue Heart Large Model

Speech Rhinoceros Big Model

Guangyu LM

Qwen2.5-Max

BaiChuan LM

Feishu Ask

No comments

Latest Articles

Popular Sites

Tag Cloud