AliCloud heavily upgraded full-stack AI system, an article to understand the cloud conference technology release

Newsflash2mos agoupdate AiFun
532 0

September 24, 2025Cloud Computing Conference(math.) genusAli Baba, character from The Arabian NightsGroup CEO Wu Yongming announced that AliCloud has heavily upgraded its full-stack AI system, realizing the benefits from theAI macromodelto the technological update of AI infrastructure. Facing the new round of intelligence revolution, AliCloud will make every effort to become a full-stack AI service provider.

In the AI era, the big model will be the next-generation operating system, and the super AI cloud is the next-generation computer. Wu Yongming believes that the super AI cloud requires super large-scale infrastructure and full-stack technology accumulation, and in the future, there may only be 5-6 super cloud computing platforms in the world. AliCloud will continue to increase investment to meet the arrival of the super AI era.

阿里云重磅升级全栈AI体系,一文看懂云栖大会技术发布

Big model seven in a row, Qwen3-Max performance among the world's top three, surpassing GPT5

Around the big model and AI cloud, 2025 Yunqi Conference site, AliCloud Intelligence CTO Zhou Jingren released a number of heavyweight technology updates. Tongyi big model 7 consecutive release, in the model intelligence level, Agent tool call and Coding ability, deep reasoning, multimodal and other aspects to achieve a number of breakthroughs.

阿里云重磅升级全栈AI体系,一文看懂云栖大会技术发布

[2025 Yunqi Conference, AliCloud CTO Zhou Jingren released a number of heavyweight technology updates]

Among the big language models, Ali Tongyi's flagship model, Qwen3-Max, has been newly unveiled, with a performance that exceeds that of GPT5, Claude Opus 4, etc., and ranks among the top three in the world. Qwen3-Max consists of two versions, Instruct and Thinking, and the preview version of Qwen3-Max has already been ranked No. 3 in the Chatbot Arena charts, and the official version is expected to break through again. Qwen3-Max has been ranked third in the Chatbot Arena charts in its preview version, and the official version is expected to break through again.

Qwen3-Max is the largest and strongest base model in the Tongyi Qianqian family, with 36T tokens of pre-training data and more than one trillion total parameters, and possesses strong Coding programming ability and Agent tool invocation ability. In the SWE-Bench Verified test, where large models solve real-world problems with Coding, the Instruct version scored 69.6 points, ranking in the first tier in the world; in the Tau2-Bench test, which focuses on Agent tool invocation capability, Qwen3-Max achieved a breakthrough score of 74.8 points, surpassing Claude In the Tau2-Bench test, which focuses on Agent tool invocation, Qwen3-Max achieved a breakthrough score of 74.8, surpassing Claude4 and DeepSeek-V3.1. Qwen3-Max inference model also shows remarkable performance, combining the tool invocation and parallel inference technology, its inference ability hits a new record high, especially in the AIME 25 and HMMT tests focusing on mathematical inference, both of them achieve breakthrough scores of 100 points, which is the first time in China.

The next generation of basic model architecture Qwen3-Next and series of models are officially released, the total parameters of the model 80B only activate 3B, the performance can be comparable to the flagship version of Qwen3 235B model, to achieve a major breakthrough in model computing efficiency. Qwen3-Next is designed for the future trend of large models in the context length and total parameters of the two aspects of the continual expansion (Scaling), and the innovative improvements Qwen3-Next is designed for the future trend of scaling of large models in terms of both context length and total parameters, with innovative improvements such as hybrid attention mechanism, high sparsity MoE structure, multi-token prediction (MTP) mechanism, etc. The cost of model training is reduced by more than 90% compared to the dense model Qwen3-32B, and the throughput of long text inference is increased by more than 10 times, which has set up a brand-new standard of training and inference efficiency of the large models in the future.

In terms of special models, Qwen3-Coder, the programming model of Thousand Questions, has been heavily upgraded. The new Qwen3-Coder is jointly trained with Qwen Code and Claude Code system, which significantly improves the application effect, reasoning speed, and code security.Qwen3-Coder has previously been widely praised by developers and enterprises, with strong code generation and completion capabilities, and it can complete the deployment of complete projects and problem repair with one click, and the amount of calls after open source was ranked second globally by 14,74% on the well-known API calling platform OpenRouter. Qwen3-Coder has been highly praised by developers and enterprises for its strong code generation and patching capabilities, and its ability to complete project deployment and problem repair with a single click.

阿里云重磅升级全栈AI体系,一文看懂云栖大会技术发布

[Tongyi Chien Qwen modeling family]

Among the multimodal models, Qwen3-VL, the much-anticipated visual understanding model of Qwen, is heavily open-sourced and achieves a major breakthrough in visual perception and multimodal reasoning, surpassing Gemini-2.5-Pro and GPT-5 in 32 core competency measurements.Qwen3-VL possesses an extremely strong visual intelligence body and visual Coding ability, which not only understands the pictures, but also operates mobile phones and computers like a human being. Qwen3-VL not only reads pictures, but also operates cell phones and computers like a human, automatically completing many daily tasks. Inputting a picture, Qwen3-VL can call agent tool to zoom in the details of the picture, and reason out a better answer through more careful observation and analysis; when seeing a design drawing, Qwen3-VL can generate Draw.io/HTML/CSS/JS code, and complete the visual programming with "what you see is what you get". visual programming. In addition, Qwen3-VL has upgraded the 3D Grounding capability to strengthen the foundation of Embodied Intelligence, extended support for millions of tokens contexts, and expanded video comprehension to more than 2 hours.

Qwen3-Omni, an omnimodal model, made a surprise debut with 32 open source best performance SOTA for audio and video capabilities, can hear and write like a human, and has a wide range of application scenarios, which can be deployed in the car, smart glasses, and cell phones in the future. Users can also set up personalized roles, adjust the dialogue style, and create their own personal IP. similar to human babies who perceive the world from birth, Qwen3-Omni adds "listening" and "speaking" from the very beginning, "Qwen3-Omni starts with a multimodal training program that includes listening, speaking, and writing. In the pre-training process, Qwen3-Omni uses a mixture of unimodal and cross-modal data. Previously, after hybrid training, each function of the model would be constrained by each other or even degraded, e.g., the audio comprehension ability is improved, while the text comprehension ability is reduced. However, Qwen3-Omni achieves strong audio and video capabilities while maintaining stable unimodal text and image performance, which is the first time in the industry that this training effect has been realized.

阿里云重磅升级全栈AI体系,一文看懂云栖大会技术发布

[Tongyi Wanxiang Wan Model Family]

Tongyi Wanxiang, the visual foundation model in the Tongyi Big Model family, introduces the Wan2.5-preview series of models, covering four major models: text-generated video, picture-generated video, text-generated picture and image editing. Tongyi Wan2.5 video generation model can generate human voice, sound effect and music BGM matching with the picture, which is the first time to realize the video generation capability of synchronizing audio and picture, further lowering the threshold of movie-grade video creation. The duration of Tongyi Manphase 2.5 video generation has been increased from 5 seconds to 10 seconds, supporting 1080P HD video generation at 24 frames per second, and further improving the ability to follow model commands. This time, Tongyi Vantage 2.5 also comprehensively upgraded the image generation capability, which can generate Chinese and English text and charts, and support image editing function, so that you can complete the P-image by typing one sentence.

阿里云重磅升级全栈AI体系,一文看懂云栖大会技术发布

[Released by Tongyi Pak Leng]

At the 2025 Hangzhou Yunqi Conference, the Tongyi big model family also ushered in a brand new member - the speech big model Tongyi Baili, including the speech recognition big model Fun-ASR and the speech synthesis big model Fun-CosyVoice. Fun-ASR is trained based on tens of millions of hours of real speech data, with powerful contextual understanding and industry adaptability; Fun-CosyVoice can provide hundreds of pre-made tones, which can be used in customer service, sales, live e-commerce, consumer electronics, audiobooks, children's entertainment and other scenes.

阿里云重磅升级全栈AI体系,一文看懂云栖大会技术发布

Tongyi Big Model has become the world's No. 1 open source model, and the model most chosen by Chinese enterprises. Up to now, Ali Tongyi has open-sourced more than 300 models, covering different sizes of "full-size" and "full-modality" such as LLM, programming, image, voice, video, etc., with global downloads exceeding 600 million times and 170,000 derivatives, ranking first in the world. Over 1 million customers have access to Tongyi. More than 1 million customers have accessed Tongyi's large models, and the first half of Sullivan 2025, an authoritative research organization, shows that Ali Tongyi occupies the first place in China's enterprise-level large model invocation market.

Average daily call volume of models increased 15 times, AliCloud Hundred Refinement released a new Agent development framework

As a one-stop modeling service and Agent development platform, Aliyun Hundred Refinement has also come to a heavy upgrade. At the conference, AliCloud released a new Agent development framework, ModelStudio-ADK, which breaks through the limitations of predefined orchestration to help enterprises efficiently develop Agents with autonomous decision-making, multiple rounds of reflection, and cyclic execution capabilities. Research project that generates deep reports. With the continuous improvement of modeling capabilities and the explosion of Agent applications, the average daily call volume of models on AliCloud's Hundred Refinements platform has increased by 15 times in the past year.

阿里云重磅升级全栈AI体系,一文看懂云栖大会技术发布

At the framework level, AliCloud ModelStudio-ADK is built based on Tongyi open source AgentScope, which allows the development of applications such as deep research, hardware agent intelligences, and complex retrieval intelligences. The framework also fully supports cloud deployment and cloud component invocation, providing enterprise-level, service-stable, flexible deployment and operation of high-code development mode to help enterprises and developers quickly realize the development and landing of complex scene Agent.

At the model level, AliCloud Hundred Refine continues to launch the new flagship model of the Tongyi Thousand Questions family, which is based on the powerful inference capabilities of the Qwen3 series of models and can drive the Agent to realize more efficient autonomous planning and decision-making, with an improved inference performance of 50% and a successful decision-making rate of 90%. Currently, users can call Qwen, Wan, DeepSeek, and other 200 industry-leading models in one click. Qwen, Wan, DeepSeek and more than 200 other industry-leading models.

At the component level, for the various components required for Agent development and deployment, Aliyun Bailian integrates seven enterprise-level capabilities, including MCP Server for tool connectivity, RAG Server for multi-mode data fusion, Sandbox Server for sandboxing tools, Memory Server for intelligent memory access, and Pay Server for payment subscription services. Taking Pay Server as an example, the service is jointly launched by AliCloud Hundred Refine and Alipay, and is the industry's first service to provide a professional commercialized payment channel for enterprise-level Agents. At present, the first batch of AliCloud Hundred Refined Agent applications such as DeepResearch, Agentic-RAG, Computer-Use Agent, etc., developed based on ModelStudio-ADK, are available for users to experience online or download the code for secondary development for free.

On the scene of the conference, AliCloud Hundred Refine also upgraded ModelStudio-ADP, a low-code Agent development platform, which has been widely used in finance, education and e-commerce and other areas of business, at present, AliCloud Hundred Refine platform has more than 200,000 developers have developed more than 800,000 Agents. according to the introduction, Netcommercial Bank has developed a loan audit based on ModelStudio-ADP The application supports 26 types of credentials such as contracts, invoices, business licenses, and more than 400 types of fine-grained objects such as storefronts, catering kitchens, dining areas, shelf goods, etc., with an accuracy rate of more than 95%, and its task processing time has been optimized from the original 3 hours to within 5 minutes.

At the same time, the shadowless AgentBay, an important component of AliCloud's Agent Infra, has received a major upgrade. Shadowless AgentBay is AliCloud's tailor-made "super brain" for Agents, which can dynamically call on cloud computing power, storage and toolchain resources, greatly breaking through the limitations of Agents' computing power on local devices. At the Cloud Amphibious Conference, AgentBay also introduced new capabilities such as self-evolving engine, custom mirrors, security fences, and memory state management, and for the first time, it showed its new personal computing product, Agentic Computer, which has a brand new way of human-computer interaction, revolutionary "memory" capabilities, and the ability to "memorize" information for the first time. The new personal computing product, Agentic Computer, has a new way of human-computer interaction, revolutionary "memory" capability and nearly unlimited computing power on the cloud.

AI arithmetic power grows more than 5 times in a year, AliCloud AI infrastructure comprehensively upgraded

AliCloud has carried out synergistic optimization and system innovation around AI for the full stack of software and hardware, and has initially formed an operating system with Tongyi as the core and a next-generation computer with AI Cloud as the core. In the past year, Ali Cloud's AI arithmetic power has grown more than 5 times, and AI storage power has grown more than 4 times.

阿里云重磅升级全栈AI体系,一文看懂云栖大会技术发布

On the scene of the 2025 Cloud Amphibious Conference, the fully upgraded AliCloud AI infrastructure made a heavy debut, comprehensively demonstrating AliCloud's full-stack AI technological capabilities from the underlying chips, supernode servers, high-performance networks, distributed storage, and smart computing clusters to the AI platform and model training and inference services.

At the server level, AliCloud released a new generation of Panjiva 128 supernode AI servers. The new generation of Panjiva supernode servers are independently developed and designed by AliCloud, with the core advantages of high density, high performance and high availability, which can efficiently support a variety of AI chips, and a single cabinet supports 128 AI compute chips, which is a new industry record for density. Panjiva super node integrates Ali's self-developed CIPU 2.0 chip and EIC/MOC high-performance network card, adopts open architecture with strong scalability, and can realize up to Pb/s level Scale-Up bandwidth and 100ns very low latency, and compared to the traditional architecture, the reasoning performance can be improved by 50% under the same AI arithmetic power.

阿里云重磅升级全栈AI体系,一文看懂云栖大会技术发布

[Panjiva AI Infra 2.0 128 Super Node Server]

At the network level, AliCloud's new-generation high-performance network HPN 8.0 is newly unveiled. To cope with the demand for massive data transmission in the era of large models, HPN 8.0 adopts the integrated architecture of training and pushing, with the storage network bandwidth rising to 800Gbps and the GPU interconnection network bandwidth reaching 6.4Tbps, which can support the efficient interconnection of 100,000 card GPUs in a single cluster, providing a high-performance, deterministic cloud-based network for large clusters of 10,000 cards, and assisting AI training and pushing to improve efficiency.

At the storage level, AliCloud's distributed storage has been comprehensively upgraded for AI requirements. High-performance parallel file storage CPFS single-client throughput has been increased to 40GB/s, which can meet the extreme demand of AI training for fast data reading; table storage Tablestore provides high-performance memory and knowledge bases for agents; object storage OSS launched Vector Bucket, which provides cost-effective mass storage for vector data and reduces the cost by 95% compared with the self-built open-source vector database. Compared with the self-built open source vector database, the cost plummeted by 95%, and combined with the OSS MetaQuery semantic retrieval and content-aware capabilities, AI applications such as RAG can be quickly constructed.

At the level of AI Intelligent Computing cluster, Smart Computing Spirit Surge cluster supports 100,000 cards stable interconnection based on HPN network through multi-level affinity and topology-aware scheduling design, and multi-level scalable architecture makes the interconnection path between each card shorter and the bandwidth more optimal. The task-oriented stability design and fault-minute recovery capability of the Spirit Stallion cluster effectively improve the cluster stability of the model training task.

The outbreak of AI demand has also led to a rise in the demand for general-purpose computing power, and Aliyun's general-purpose computing has been comprehensively upgraded. Relying on the self-developed "Feitian + CIPU" architecture system, Aliyun's ninth-generation enterprise-class instances use the latest chips from Intel and AMD, which can provide stable, secure, high-performance general-purpose CPU arithmetic for Agent while significantly increasing the level of arithmetic power. Among them, the ninth-generation AMD instance g9ae provides the specification of physical cores, and the performance is upgraded by up to 67%, which is especially suitable for high concurrency scenarios such as offline data analysis and processing and video transcoding in enterprises.

Container computing, which provides elasticity, scheduling optimization, and scale operation for AI loads, has also seen heavy upgrades. Container service ACK adds new Lingjun node pools, introduces model-aware intelligent routing, multi-role reasoning load management, fault self-healing and other core functions, shortens the automatic processing recovery time by 85%, and speeds up model reasoning cold start by 10 times. Container computing service ACS strengthens network topology-aware scheduling, and the overall performance of task communication has been improved by 30%, and for theAI AgentDeeply optimized for scenarios, Serverless GPU power out of the box supports massive concurrency resilience of 15,000 sandboxes per minute, combined with secure sandboxing, intelligent sleep and wake-up, enabling Agent on-demand enablement and efficient response.

The joint optimization of the AliCloud AI platform PAI and the Tongyi big model confirms the "1+1>2" effect of full-stack AI. In the training layer, for the MoE model, the unified scheduling mechanism, adaptive computation communication mask, EP computation load balancing and computation memory separated parallelism and other optimization means are used, which makes Tongyi Thousand Questions model training end-to-end acceleration ratio increase by more than 3 times; upgrading the DiT model training engine, the Tongyi 10,000 phases single-sample training time is reduced by 28.1%; in the reasoning layer, the single-sample training time is reduced by 28.1% through the use of large-scale EP, PD/AF separation, weight optimization, and LLM intelligent paths. AF separation, weight optimization, LLM intelligent routing, including full-link optimization, to achieve a significant increase in reasoning efficiency: reasoning throughput TPS increased by 71%, delay TPOT reduced by 70.6%, and expansion time reduced by 97.6%.

"AliCloud is making every effort to build a new AI supercomputer that has both the leading AI infrastructure and the leading models, both of which can be highly synergized in product design and operational architecture to ensure maximum efficiency when calling and training Tongyi Thousand Questions models on AliCloud." Wu Yongming said.

As of today, AliCloud operates China's No. 1 and world-leading AI infrastructure and cloud computing network, with 90 availability zones in 29 geographic regions around the world. Data from the first half of the tripartite organization Omdia2025 shows that China's AI cloud market, Aliyun, accounts for 35.8%, more than the sum of the 2 to 4 names; in the Fortune China 500 that have adopted generative AI, more than 53% enterprises have chosen Aliyun, with the penetration rate ranking first. In the next 3 years, Alibaba will invest 380 billion for building cloud and AI infrastructure, totaling more than the sum of the past decade.

© Copyright notes

Related posts

No comments

none
No comments...