CES 2026: Huang Renxun Unveils Six Chips at Once, Summoning the Most Powerful AI Supercomputer!
Founder and CEO of NVIDIAHuang Jen-hsun (1944-), Chinese-American physicistAt CES 2026, the first keynote address of the year was delivered. Huang Renxun, clad in his signature leather jacket as always, unveiled eight major announcements over the course of 1.5 hours. From chips and racks to network design, he provided an in-depth introduction to the entire new-generation platform.
In the field of accelerated computing and AI infrastructure, NVIDIA has releasedNVIDIA Vera Rubin POD AI Supercomputer, NVIDIA Spectrum-X Ethernet Co-Packaged Optics, NVIDIA Inference Context Memory Storage Platform, NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72.

NVIDIA Vera Rubin PODFeaturing NVIDIA's six proprietary chips—covering CPU, GPU, scale-up, scale-out, storage, and processing capabilities—all components are co-engineered to meet advanced model demands while reducing computational costs.
Among these, the Vera CPU employs a custom Olympus core architecture, while the Rubin GPU achieves NBFP4 inference performance as high as50 petaflopsEach GPU NVLink bandwidth reaches up to3.6 terabytes per secondSupports third-generation universal confidential computing (the first rack-level TEE), enabling a complete trusted execution environment across CPU and GPU domains.

These chips have all been returned, NVIDIA has validated the entire NVIDIA Vera Rubin NVL72 system, partners have begun running their internally integrated AI models and algorithms, and the entire ecosystem is preparing for deployment on Vera Rubin.
Other releases,NVIDIA Spectrum-X Ethernet Co-Packaged OpticsSignificantly optimized power efficiency and application uptime;NVIDIA Inference Context Memory Storage PlatformRedefined the storage stack to reduce redundant computations and enhance inference efficiency;NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72Reduce token costs for large MoE models to one-tenth.

Regarding open modelsNVIDIA announced the expansion of its open-source model suite, releasing new models, datasets, and libraries, includingNVIDIA Nemotron Open-Source Model Series Adds Agentic RAG Model, Security Model, and Speech Modelalso releasedA brand-new open model for all types of robotsHowever, Huang Renxun did not elaborate on this in his speech.
In the field of physics AIThe era of physics-based AI ChatGPT has arrived. NVIDIA's full-stack technology empowers the global ecosystem to transform industries through AI-driven robotics. NVIDIA's extensive AI toolkit includesThe All-New Alpamayo Open-Source ModelCombining these technologies enables the global transportation industry to rapidly achieve safe Level 4 autonomous driving.The NVIDIA DRIVE autonomous driving platform is now in production, powering all new Mercedes-Benz CLA models.for L2++ AI-defined driving.

I. New AI Supercomputer: Six Self-Developed Chips Deliver 3.6 EFLOPS Per Rack
Jensen Huang believes that every 10 to 15 years, the computing industry undergoes a comprehensive transformation. This time, however, two platform shifts are occurring simultaneously: from CPUs to GPUs, and from “programming software” to “training software.” Accelerated computing and AI are reshaping the entire computing stack. The trillion-dollar computing industry of the past decade is now undergoing a modernization overhaul.
Meanwhile, demand for computing power has skyrocketed. Model sizes increase tenfold annually, the number of tokens models process grows fivefold yearly, while the cost per token decreases tenfold each year.

To meet this demand, NVIDIA has decided to release new computing hardware annually. Jensen Huang revealed that Vera Rubin has now entered full-scale production.
NVIDIA's new AI supercomputer, the NVIDIA Vera Rubin POD, incorporates six custom-designed chips:Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 (CX9) Smart Network Interface Card, BlueField-4 DPU, Spectrum-X 102.4T CPO.
(1) Vera CPU:Designed for data mobility and agent processing, featuring88NVIDIA customizes the Olympus core,176 threadsNVIDIA Space Multithreading,1.8 terabytes per second NVLink-C2C supports CPU-GPU unified memory, with system memory reaching1.5TB(three times that of the Grace CPU), the SOCAMM LPDDR5X memory bandwidth is1.2 terabytes per secondSupports rack-level confidential computing, doubling data processing performance.

(2) Rubin GPU:By introducing the Transformer engine, NVFP4 achieves high inference performance of up to50 petaflopsis the Blackwell GPU5 timesBackward compatible, enhancing BF16/FP4-level performance while maintaining inference accuracy; NVFP4 training performance reaches35 petaflops, is Blackwell's3.5 times.
Rubin is also the first to supportHBM4The platform features HBM4 bandwidth up to22 terabytes per second, is the previous generation's2.8 timesIt can deliver the performance required for demanding MoE models and AI workloads.

(3) NVLink 6 Switch:Single-lane throughput increased to400 gigabits per secondUtilizing SerDes technology to achieve high-speed signal transmission; each GPU can achieve3.6 terabytes per secondThe fully interconnected communication bandwidth is2xTotal bandwidth is28.8 terabytes per secondAt FP8 precision, in-network computational performance reaches14.4 TFLOPSSupport100% Liquid-Cooled.

(4) NVIDIA ConnectX-9 SuperNIC:Each GPU provides1.6 Tb/sBandwidth, optimized for large-scale AI, features a fully software-defined, programmable, and accelerated data path.

(5) NVIDIA BlueField-4:800Gbps DPU for smart network interface cards and storage processors, equipped with a 64-core Grace CPU. Combined with ConnectX-9 SuperNIC, it offloads network and storage-related computational tasks while enhancing cybersecurity capabilities. Its computational performance is6 timesMemory bandwidth reaches3 timesThe speed of GPU access to data storage has been increased to2x.

(6) NVIDIA Vera Rubin NVL72:At the system level, all the above components are integrated into a single-rack processing system, featuring2 trillionTransistor, NVFP4 inference performance reaches3.6 exaflopsThe training performance of NVFP4 reaches2.5 exaflops.
The system features LPDDR5X memory with a capacity of54TB, is the previous generation's2.5 timesTotal HBM4 memory reaches20.7 terabytes, is the previous generation's1.5 timesHBM4 bandwidth is1.6 petabytes per second, is the previous generation's2.8 timesThe total longitudinal expansion bandwidth reaches260 terabits per secondexceeding the total bandwidth capacity of the global internet.

This system is based on the third-generation MGX rack design. Its compute trays feature a modular, serverless, cable-free, and fanless design, enabling assembly and maintenance speeds 18 times faster than the GB200. What previously required two hours of assembly now takes only about five minutes. Additionally, while the original system utilized liquid cooling for approximately 80%, the current system employs liquid cooling for 100%.

The NVLink Switch tray enables zero-downtime maintenance and fault tolerance, allowing the rack to remain operational even when trays are removed or partially deployed. The second-generation RAS engine performs zero-downtime health checks.
These features enhance system uptime and throughput, further reducing training and inference costs while meeting data center requirements for high reliability and maintainability.
Over 80 MGX partners are ready to support the deployment of the Rubin NVL72 in hyperscale networks.
II. Three Major Innovations Revolutionize AI Inference Efficiency: New CPO Devices, New Context Storage Layer, New DGX SuperPOD
Meanwhile, NVIDIA unveiled three major new products:NVIDIA Spectrum-X Ethernet Co-Packaged Optics, NVIDIA Inference Context Memory Storage Platform, NVIDIA DGX SuperPOD based on DGX Vera Rubin NVL72.
1. NVIDIA Spectrum-X Ethernet Co-Packaged Optical Device
NVIDIA Spectrum-X Ethernet Co-Packaged Optics is based on the Spectrum-X architecture, featuring a dual-chip design with 200Gbps SerDes. Each ASIC provides102.4 Tb/sBandwidth.
This switching platform comprises a 512-port high-density system and a 128-port compact system, each port operating at a rate of 800Gb/s.

CPO (Co-Packaged Optics) switching systems enable5 timesenergy efficiency improvement,10 timesimproved reliability,5 timesApplication uptime improvement.
This means more tokens can be processed daily, further reducing the total cost of ownership (TCO) for data centers.
2. NVIDIA Inference Context Memory Storage Platform
The NVIDIA Inference Context Memory Storage Platform is a POD-level AI native storage infrastructure designed to store KV Cache. Accelerated by BlueField-4 and Spectrum-X Ethernet, it tightly integrates with NVIDIA Dynamo and NVLink to enable coordinated context scheduling across memory, storage, and networking.
This platform treats context as a first-class data type, enabling5 timesinference performance,5 timessuperior energy efficiency.

This is crucial for improving long-context applications such as multi-turn dialog, RAG, and agentic multi-step reasoning, which heavily rely on the ability to efficiently store, reuse, and share context throughout the system.
AI is evolving from chatbots to Agentic AI, capable of reasoning, invoking tools, and maintaining long-term state. Context windows have expanded to millions of tokens. This context is stored in KV Caches; recalculating it at every step wastes GPU time and introduces significant latency, necessitating storage.
However, while GPU memory is fast, it is scarce, and traditional network storage is inefficient for short-term context. The bottleneck in AI inference is shifting from computation to context storage. Therefore, a new memory layer optimized for inference is needed—one positioned between the GPU and storage.

This layer is no longer an afterthought but must be designed in tandem with network storage to move context data with minimal overhead.
As a new storage tier, the NVIDIA Inference Context Memory Storage Platform does not reside directly within the host system but connects to compute devices externally via BlueField-4. Its key advantage lies in enabling more efficient scaling of storage pools, thereby avoiding redundant computation of KV Cache.
NVIDIA is working closely with storage partners to bring the NVIDIA Inference Context Memory Storage Platform to the Rubin platform, enabling customers to deploy it as part of a fully integrated AI infrastructure.
3. NVIDIA DGX SuperPOD Built on Vera Rubin
At the system level, the NVIDIA DGX SuperPOD serves as a blueprint for deploying large-scale AI factories. It incorporates eight DGX Vera Rubin NVL72 systems, utilizes NVLink 6 for vertical scaling, employs Spectrum-X Ethernet for horizontal scaling, and features an integrated NVIDIA Inference Context Memory Storage Platform, all of which have undergone engineering validation.
The entire system is managed by NVIDIA Mission Control software, delivering ultimate efficiency. Customers can deploy it as a turnkey platform to complete training and inference tasks with fewer GPUs.
Through extreme co-design across six chips, trays, racks, Pods, data centers, and software, the Rubin platform achieves significant reductions in training and inference costs. Compared to the previous-generation Blackwell, training MoE models of the same scale requires only1/4The number of GPUs; at the same latency, the token cost of large MoE models is reduced to1/10.

The NVIDIA DGX SuperPOD featuring the DGX Rubin NVL8 system was also announced.

Leveraging the Vera Rubin architecture, NVIDIA is collaborating with partners and customers to build the world's largest, most advanced, and most cost-effective AI systems, accelerating the mainstream adoption of AI.
Rubin infrastructure will be made available through cloud service providers and system integrators in the second half of this year, with Microsoft among the first to deploy it.
III. Expanding the Open Model Universe: Key Contributors to New Models, Data, and Open-Source Ecosystems
At the software and model level, NVIDIA continues to increase its investment in open-source initiatives.
Data from leading development platforms like OpenRouter shows that AI model usage has increased twentyfold over the past year, with approximately one-quarter of tokens originating from open-source models.

In 2025, NVIDIA was the largest contributor of open-source models, datasets, and recipes on Hugging Face, releasing 650 open-source models and 250 open-source datasets.

NVIDIA's open-source models rank among the top performers across multiple benchmarks. Developers can not only utilize these open-source models but also learn from them, continuously train them, expand their datasets, and build AI systems using open-source tools and documented technologies.

Inspired by Perplexity, Huang Renxun observed that agents should be multi-model, multi-cloud, and hybrid-cloud—a fundamental architecture for agentic AI systems that nearly all startups are adopting.

Leveraging NVIDIA's open-source models and tools, developers can now customize AI systems and harness cutting-edge model capabilities. NVIDIA has integrated these frameworks into “Blueprints” and incorporated them into its SaaS platform. Users can achieve rapid deployment through these Blueprints.
In the live demonstration case, this system can automatically determine which tasks should be handled byLocal Private ModelStillCloud Frontier ModelProcessing can also invoke external tools (such as email APIs, robot control interfaces, calendar services, etc.), and achieve multimodal fusion to uniformly handle information including text, speech, images, and robot sensor signals.

These complex capabilities were unimaginable in the past, but today they have become commonplace. Similar capabilities are now available on enterprise platforms such as ServiceNow and Snowflake.
IV. Open-Source Alpha-Mayo Model: Enabling Autonomous Vehicles to “Think”
NVIDIA believes that physical AI and robotics will ultimately become the world's largest consumer electronics segment. Everything that moves will eventually achieve full autonomy, powered by physical AI.
AI has progressed through stages of Perceptual AI, Generative AI, and Agentic AI, and is now entering the era of Physical AI. Intelligence is stepping into the real world, with models capable of understanding physical laws and generating actions directly from perceptions of the physical world.

To achieve this goal, the physical AI must learn the fundamental principles of the world—object persistence, gravity, and friction. Acquiring these capabilities will rely on three computers:Train computers(DGX) is used to build AI models.Inference Computer(Robotic/Automotive Chips) for real-time execution,Simulation Computer(Omniverse) is used to generate synthetic data and validate physical logic.
The core model among them isCosmosWorld Foundation Model aligns language, images, 3D data, and physical laws to support the entire chain from simulation to training data generation.
Physical AI will appear in three types of entities:Buildings (such as factories and warehouses), robots, autonomous vehicles.
Jensen Huang believes that autonomous driving will be the first large-scale application scenario for physical AI. Such systems require understanding the real world, making decisions, and executing actions, demanding extremely high standards for safety, simulation, and data.
In response, NVIDIA announcedAlpha-MayoA comprehensive ecosystem comprising open-source models, simulation tools, and physics AI datasets to accelerate the development of secure, inference-based physics AI.
Its product portfolio provides the foundational building blocks for global automakers, suppliers, startups, and researchers to develop Level 4 autonomous driving systems.

Alpha-Mayo is the industry's first model that truly enables autonomous vehicles to “think.” This model has been open-sourced. It breaks down problems into steps, reasons through all possibilities, and selects the safest path.

This reasoning-based task-action model enables autonomous driving systems to handle complex edge scenarios they have never encountered before, such as traffic light failures at busy intersections.
Alpha-Mayo possesses10 billion parametersThe system is powerful enough to handle autonomous driving tasks while remaining lightweight enough to run on workstations designed for autonomous driving researchers.
It can receive text, surround-view cameras, vehicle historical data, and navigation inputs, and output driving trajectories and reasoning processes, enabling passengers to understand why the vehicle took a particular action.
In the promotional video shown at the event, the autonomous vehicle, powered by Alpha-Mayo, can independently perform maneuvers such as avoiding pedestrians and anticipating left-turning vehicles to change lanes and bypass them—all without any human intervention.

Huang Renxun stated that the Mercedes-Benz CLA equipped with Alpha-Mayo has entered production and was recently rated by NCAP as the world's safest car. Every line of code, chip, and system has undergone safety certification. The system will launch in the U.S. market and introduce enhanced driving capabilities later this year, including hands-free driving on highways and end-to-end autonomous driving in urban environments.

NVIDIA also releasedPartial dataset used for training Alpha-Mayo,Open-Source Inference Model Evaluation Simulation Framework Alpha-SimDevelopers can fine-tune Alpha-Mayo using their own data or generate synthetic data with Cosmos, then train and test autonomous driving applications using a combination of real and synthetic data. Additionally, NVIDIA announced...The NVIDIA DRIVE platform is now in production..
NVIDIA announced that Boston Dynamics, Franka Robotics, Surgical Robotics, LG Electronics, NEURA, XRLabs, and Zhi Yuan Robotics, among others,Global robotics leaders are built on NVIDIA Isaac and GR00T..

Jensen Huang also announced the latest collaboration with Siemens. Siemens is integrating NVIDIA CUDA-X, AI models, and Omniverse into its suite of EDA, CAE, and digital twin tools and platforms. Physical AI will be extensively applied across the entire process from design and simulation to manufacturing and operations.
Closing Remarks: Embracing open-source with one hand, while making hardware systems irreplaceable with the other.
As the focus of AI infrastructure shifts from training to large-scale inference, platform competition has evolved from single-point computing power to a system-level engineering effort encompassing chips, racks, networks, and software. The goal now is to deliver maximum inference throughput at the lowest total cost of ownership (TCO). AI is entering a new phase of “factory-scale operations.”
NVIDIA places significant emphasis on system-level design. Rubin delivers performance and cost efficiency improvements in both training and inference, serving as a plug-and-play replacement for Blackwell that enables seamless migration from Blackwell.
In terms of platform positioning, NVIDIA continues to prioritize training as critical, believing that only by rapidly training state-of-the-art models can the inference platform truly deliver value. Consequently, the introduction of NVFP4 training in the Rubin GPU further enhances performance and reduces total cost of ownership (TCO).
Meanwhile, this AI computing giant has significantly enhanced network communication capabilities across both vertical and horizontal scaling architectures, identifying context as a critical bottleneck to achieve coordinated design across storage, networking, and computing.
While NVIDIA aggressively embraces open-source initiatives, it simultaneously makes its hardware, interconnects, and system designs increasingly “indispensable.” This closed-loop strategy—continuously expanding demand, incentivizing token consumption, driving large-scale inference, and providing cost-effective infrastructure—is building an even more impregnable moat for NVIDIA.
© Copyright notes
The copyright of the article belongs to the author, please do not reprint without permission.
Related articles
No comments...