
What is Qwen3-Next?
Qwen3-Next is the next-generation base model architecture released by AliCloud's Tongyi team on September 12, 2025, aiming to achieve extreme contextual processing power and parameter efficiency through architectural innovation. Its core model, Qwen3-Next-80B-A3B, has 80 billion total parameters, but only 3 billion parameters are activated during inference (activation ratio 1:50), which significantly reduces computational costs while maintaining high performance. The model supports millions of tokens of ultra-long contexts, reduces the training cost by more than 90% compared with the previous generation of dense model Qwen3-32B, and improves the throughput of long text inference by more than 10 times, which is comparable to the flagship version of the Qwen3 model with 235 billion parameters.
Qwen3-Next's core technology
- High sparsity MoE architecture
- Dual-track expert designThe model contains 512 expert modules, with 10 sparse experts + 1 shared expert dynamically selected for each inference. Shared experts provide a stable computational base, while sparse experts handle specialized tasks, realizing "general practitioner + specialist" collaboration.
- extreme sparsity: Activation parameter ratio up to1:50The company's computational efficiency has been enhanced by the fact that it is well above the industry average (e.g., 1:10 for Qwen3).90%Above.
- Hybrid Attention mechanism (Hybrid Attention)
- Gated DeltaNet (linear attention): by O(N) complexity Modeling long distance dependencies (e.g., entire book veins) with reduced memory consumption 50%.
- Gated Attention: Efficiently capture localized information (e.g., phrases, keywords) and mix the two in a 3:1 ratio to balance performance and efficiency.
- Multi Token Prediction (MTP)
- The pre-training phase predicts multiple future Tokens (e.g., t+1, t+2, ..., t+n) at the same time to improve the model's understanding of causal relationships.
- Adapt Speculative Decoding in the inference phase to generate multiple candidate Tokens at once and validate them in parallel for faster decoding. several times (bigger).
- Training stability optimization
- Zero-Centered RMSNorm: Impose constraints on normalization layer weights to avoid gradient explosion or vanishing and improve training stability.
- MoE route initialization optimization: Ensure that expert modules are selected unbiased early in training to reduce initialization perturbations.
Scenarios for Qwen3-Next
- Long Text Processing
- Analysis of legal instruments: Support for multi-million Tokens contexts for complete parsing of long documents such as contracts and judgments.
- Review of scientific literature: Efficiently process long papers and lab reports, extract key information and generate summaries.
- Efficient Reasoning
- real time interactive application: The low activation parameter design enables it to excel in domestic arithmetic and is suitable for intelligent customer service, online education and other scenarios.
- Low latency generation: MTP technology accelerates the decoding process and improves conversation smoothness.
- complex reasoning task
- Math and Programming: Score on the AIME25 Math Reasoning Assessment87.8, approaching SOTA levels; outperforming the flagship Thousand Questions open source model in the LiveCodeBench programming review.
- Multi-step logic chain construction: Reasoning models (Thinking versions) excel at solving problems that require step-by-step reasoning, such as logic puzzles and strategic planning.
Qwen3-Next project address
- Official Web Version::chat.qwen.ai
- HuggingFace::huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
- Kaggle::kaggle.com/models/qwen-lm/qwen3-next-80b
Recommended Reasons
- Ultimate price/performance ratio
- Lower training costs90%above, reasoning throughput is increased10 timesIt significantly lowers the threshold for enterprise AI adoption.
- technological leadership
- Innovative technologies such as Hybrid Attention Mechanism, High Sparsity MoE, and MTP represent the cutting edge of the industry and set a new standard for long context processing.
- Open Source Ecological Advantage
- The number of models derived from Tongyi's thousand questions exceeds170,000The company is the world's No. 1, and developers can quickly customize applications based on open source code.
- Strong scenario adaptability
- It supports diverse scenarios from long text analysis to real-time interaction, covering a wide range of industries such as law, scientific research, education, and customer service.
data statistics
Relevant Navigation

Alibaba launched an efficient video generation model that can accurately simulate complex scenes and actions, support Chinese and English special effects, and lead a new era of AI video creation.

Gemini 2.5 Pro
Google introduces advanced AI models with powerful reasoning capabilities, multimodal support, and ultra-long context windows for multiple scenarios such as academic research, software development, creative work, and enterprise applications.

TianGong LM
Kunlun World Wide's self-developed double-gigabyte large language model, with powerful text generation and comprehension capabilities and support for multimodal interaction, is an important innovation in the field of Chinese AI.

Confucius-o1
NetEaseYouDao launched the first 14B lightweight model in China that supports step-by-step reasoning and explanation, designed for educational scenarios, which can help students efficiently understand complex math problems.

DeepSeek-V3
Hangzhou Depth Seeker has launched an efficient open source language model with 67.1 billion parameters, using a hybrid expert architecture that excels at handling math, coding and multilingual tasks.

Nova Sonic
Amazon has introduced a new generation of generative AI speech models with unified model architecture, natural and smooth voice interaction, real-time two-way conversation capability and multi-language support, which can be widely used in multi-industry scenarios.

SKYMEDIA
Wanxing Technology has developed China's first audio and video multimedia creation pendant big model, which integrates video, audio, picture and language processing capabilities to provide powerful AI creation support for the digital creative field.

Z.ai
Smart Spectrum launched a new generation of AI application platform that integrates three types of GLM models: base, inference, and contemplation, which is free and open to global users and provides powerful AI capability support.
No comments...
