Full text of Jen-Hsun Huang's CES 2025 speech: world's fastest GeForce GPUs, $3,000 personal AI supercomputing, and world-based models

Character5mos agoupdate AiFun
534 0

Beijing time, January 7, NVIDIA founder and CEO Jen-Hsun Huang wore a new $65,000 Tom Ford jacket at CES in Las Vegas, delivering the opening keynote address and unveiling a series of new products and technologies.

Here are the main highlights of the launch:

Introducing the RTX 5090, a new generation of GPUs based on the Blackwell architecture, the high-end model RTX 5090 has 92 billion transistors and delivers 3,400 TOPS of arithmetic with 4,000 AI TOPS (trillion operations per second) performance for $1,999.

The RTX 5070, RTX 5070 Ti, RTX 5080 and RTX 5090 are priced at $549 (~$4,023), $749 (~$5,489), $999 (~$7,321), and $1,999 (~$14,651), respectively. Of these, the RTX 5070 performs the same as the RTX 4090, which was previously priced at $1,599, amounting to a 1/3 price reduction.

Introduced NVLink72, the latest key interconnect technology for the Blackwell architecture. 130 trillion transistors, 72 Blackwell GPUs with 1.4 ExaFLOPS TE FP4 compute power and 2,592 Grace CPU cores.

"Scaling law continues": the first scaling law is pre-training; the second scaling law is post-training; and the third scaling law is computation at test time.

Demonstrates Agentic AI with "Teat-Time Scaling" that supports tools such as calculators, web search, semantic search, SQL search, and even generates podcasts.

Launch of Nemotron models, including the Llama Nemotron large language model and the Llama Nemotron large language model in Nano, Super and Ultra.

AI intelligences could be the next robotics industry and could be worth trillions of dollars of opportunity.

Launched Cosmos, an open-source, commercially available model of the physical AI world foundation that converts images and text into actionable tasks for robots, seamlessly integrating vision and language understanding to perform complex actions.

Announced generative AI models and blueprints to further extend NVIDIA Omniverse integration into physical AI applications such as robotics, self-driving cars and vision AI.

Physical AI will revolutionize the $50 trillion manufacturing and logistics industry, with everything that moves - from cars and trucks to factories and warehouses - enabled by robots and AI.

Launched Project Digits, the world's smallest personal AI supercomputer, which is equipped with the new Grace Blackwell supercomputer chip and supports individuals to directly run large models with 200 billion parameters, and two Project Digits can run through large models with 405 billion parameters. models.

The following is the full text of Jen-Hsun Huang's speech:

It all started in 1993.

Welcome to CES! Is everyone happy to be in Las Vegas? Do you guys like my jacket? (Editor's note: $8,990!)

I think I speak in the same style as Gary Shappero.(CEO of CTA, President of CES)The distinction is open, after all, I am in Las Vegas. If this doesn't work, and if you're all against it, then ...... you'll just try to get used to it. In another hour or so, you guys will think this is not so bad.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Welcome to NVIDIA -- actually, you're inside NVIDIA's digital twin right now -- ladies and gentlemen, welcome to NVIDIA. You're inside our digital twin, where everything is generated by AI.

It has been a remarkable journey, a remarkable year, and it all started in 1993.

With NV1(NVIDIA's first GPU)When we wanted to build computers that could do things that ordinary computers couldn't, the NV1 succeeded in making it possible to play consoles on computers, and our programming architecture became known as the UDA (Unified Device Architecture), and soon after as the UDA Unified Device Architecture. Architecture".

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

The first application I developed on UDA was Virtua Fighter. Six years later, we invented the programmable GPU in 1999, and since then there have been more than 20 years of amazing advances in this incredible processor, the GPU. It has made modern computer graphics possible.

Now, thirty years later, VR Quick Hit has been fully cinematographed. That's what we're going to be doing with our new VR Quickening project, and I can't wait to tell you about it, it's super amazing.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Another six years later, we invented Kuda, through which we were able to explain or express the programmability of GPUs, and which also allowed me to benefit from a rich set of algorithms. At first, it was hard to explain and it took years - about six years in fact.

Somehow, six years later, in 2012, Alex Kirshevsky, Elias Susker, and Jeff Hinton discovered CUDA and used it to process the Alex Net, which is now history.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Today, AI is starting to move at an incredible pace. We started with perceptual AI, to being able to understand images, words, and sounds, generative AI, to being able to generate images, text, and sounds, to AI agents that can now perceive, reason, plan, and act, and then on to the next phase, physical AI, some of which we'll discuss tonight.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

In 2018, something pretty amazing happened. Google released the Bidirectional Encoder Representation Technology (BERT) based on Transformer , and the world of AI really took off.

As you know, the Transformer completely changed the landscape of AI. In fact, it completely changed the computing landscape. We rightly recognize that AI is more than just a new application and business opportunity, but more importantly, theMachine learning, driven by converters, will fundamentally change the way computing works.

Today, computing has been revolutionized at every level, from manually writing instructions that run on CPUs to creating software tools that humans use. We now have machine learning, which creates and optimizes Neural networks, processes and creates AI on GPUs, and every level of the technology stack has changed radically, with incredible transformations taking place in just 12 years.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Now, we can understand information in almost any modality. Of course, you've seen things like text, images, sounds, but we can understand not only those, but also amino acids, physics, and so on. Not only do we understand them, we can translate and generate them. The applications are almost endless.

In fact, for almost any AI application you see, if you ask these three basic questions: what is the form of input? What form of information do I learn from? What form of information does it translate into? What form of information does it generate? Almost every application has an answer.

So when you see an application that is powered by AI, it has this one basic concept at its core.

Machine learning changes the way every application is built, the way it is computed, and the possibilities for transcendence.

Now, all things AI-related, by GeForce(brand of graphics processor for personal computers developed by NVIDIA)The architecture came from GeForce enabling AI to reach the masses. Now that AI is coming back to GeForce, there are many things that can't be done without AI, so let me show you.

(Demonstration video)

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型 黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型 黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

That's real time computer graphics, and no computer graphics researcher or scientist is going to tell you that it's now possible to ray trace every single pixel. Ray tracing is a technique that simulates light, and the order of magnitude of the geometry you see is absolutely insane, and without AI, it would be nearly impossible.

We did two basic things. Of course, we used programmable shading and ray traced acceleration to generate incredibly beautiful pixels.

But then we let the AI condition and control based on those pixels to generate a lot of other pixels because it knew what the colors were supposed to be and had been trained on NVIDIA's supercomputer. As a result, the neural network running on the GPU was able to infer and predict our unrendered pixels.

Not only can we do this, it's called DLSS (Deep Learning Super Sampling).The latest generation of DLSS is also able to go beyond frames and can predict the future, generating three frames for every frame computed.

For example, if you're looking at four frames right now, it's made up of one frame that we rendered and three additional frames that were generated.

If I set up four frames at full HD 4K, that's about 33 million pixels, and out of those 33 million pixels, we counted 2 million pixels with programmable shaders and our ray tracing engine and let the AI predict all the other 33 million pixels - that's an absolute miracle.

As a result, we are able to render at extremely high performance because the AI reduces the amount of computation. Of course, training it requires huge amounts of arithmetic, but once the training is complete, the generation process is extremely efficient.

That's an incredible capability of AI, and that's why there are so many amazing things happening. We used GeForce to enable AI, and now AI is revolutionizing GeForce.

Blackwell family's newest GPU! RTX 50 series chips rock!

Folks, we're here today to announce the next generation of the RTX Blackwell family. Let's take a look.

(Demonstration video)

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型 黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

WATCH.This is our new GeForce RTX 50 series chip based on the Blackwell architecture.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

This GPU is really a "beast" with 92 billion transistors and 4000 TOPS!(Trillions of operations per second)of AI performance, three times that of the previous generation Ada architecture.

To generate those pixels I just showed, we still need these:

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型
  • 380 RT TFLOPS (trillion floating point operations per second) of ray tracing performance so we can compute the most beautiful images;
  • 125 Shader TFLOPS (shading units) of shader performance, and in fact parallel shader teraflops as well as an internal drift unit of comparable performance, so there are two dual shaders, one for floating-point arithmetic and one for integer arithmetic;
  • As well as G7 memory from Micron with 1.8 terabytes per second of bandwidth, twice as much as our previous generation, allowing us to mix AI workloads with computer graphics workloads.

One of the amazing things about this generation is that programmable shaders are now able to handle neural networks as well. So shaders are able to host these neural networks, and as a result we invented neural texture compression and neural material shading.

By doing all of the above, you get these stunningly beautiful images that can only be achieved by using AI to learn textures, learn compression algorithms, and get extraordinary results.

This is the new RTX Blackwell 50 series and even the mechanical design is a marvel. See, it has two fans and the whole card is simply one giant fan. So the question is, is the graphics card really that big? Actually, the regular voltage design is state-of-the-art, this GPU has an incredible design and the engineering team did a great job, thanks.

Next up is speed and cost. How does it compare? This is the RTX 4090. i know many of you have this card. At $1,599, it's definitely one of the best investments you can make. For $1,599, you can bring it back to your $10,000 "PC entertainment center".

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

That's right, isn't it? Don't tell me I'm not right. This card is liquid cooled and has gorgeous lighting all around. You lock it up when you leave and it's a modern home theater that makes total sense.

And now, with the RTX 5070 from the Blackwell family, you can do it for just $549 and boost your configuration and performance.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

None of this would be possible without AI, nor without the four top quadruple-order operations of the AI tensor cores (tensor cores), nor without the G7 RAM.

Okay, this is the entire RTX 50 family, from RTX 5070 all the way up to RTX 5090.The latter has twice the performance of the 4090. We will begin mass production in January.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

It's really unbelievable, but we managed to install these GPUs into the laptop.

It's an RTX 5070 laptop priced at $12,909, and it's the performance equivalent of the 4090.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Can you visualize it? Does it make sense to shrink this incredible graphics card and put it in? There's nothing AI can't do.

The reason for this is that we generate most of the pixels through our tests. Therefore, we only track the pixels we need and the rest are generated through AI. The result is simply incredible energy efficiency. The future of computer graphics is Neural rendering, a combination of AI and computer graphics.

What's really amazing is that we're about to put the current GPU family in a computer. the RTX 5090 fits into a thin laptop, at 14.9 millimeters thick.

So, ladies and gentlemen, this is the RTX Blackwell family.

renewedScaling law has emerged, where models can train themselves and apply different resource allocations

GeForce brought Artificial Intelligence (AI) to the world and popularized it. Now that AI has come back to revolutionize GeForce, let's talk about AI.

The entire industry is catching up and racing to expand AI, and Scaling law is a powerful model, a rule of thumb that has been observed and proven by generations of researchers and industry.

Scaling law suggests that the larger the amount of training data one has, the larger the model will be, and the more computational power one invests, the more effective or powerful the model will be. So Scaling law goes on.

Amazingly, the Internet generates about twice as much data each year as it did last year. I thinkIn the next few years, humans will generate more data than all humans have generated since the beginning of time combined.

We are still generating large amounts of data that exhibit multimodal characteristics, including video, images, and sound. All this data can be used to train the basics of artificial intelligence.

However, there are actually two other new Scaling laws that have emerged that are somewhat intuitive.

The second scaling law is the "post-training scaling law.".

Post-training Scaling law uses techniques such as reinforcement learning and human feedback. Basically, the AI generates an answer based on a human query and then the human gives feedback. Things are much more complicated than that, but this reinforcement learning system keeps the AI improving its skills with tons of high-quality hints.

It can be fine-tuned for specific areas, such as becoming better at solving math problems and reasoning.

So it's essentially like having a mentor or coach giving you feedback after you've taken the course. You would take the test, get the feedback, and then self-improve. We also use reinforcement learning, artificial intelligence feedback, and synthetic data generation, which are techniques similar to self-practice, where you know the answer to a question, for example, and keep trying until you get the right answer.

Thus, AI can be confronted with a complex and difficult problem that is functionally verifiable and has an answer that we understand, perhaps proving a theorem or solving a geometric problem.These questions prompt the AI to generate answers and learn how to improve itself through reinforcement learning, which is called post-training.Post-training requires a lot of computing power, but the end result produces incredible models.

The third Scaling law is related to the so-called test time extension.Test time scaling means that when you use AI, the AI is able to apply different resource allocations rather than just improving its parameters.Now it focuses on deciding how much computing power to use to generate the required answers.

Reasoning is one way of thinking, and thinking for a long time is another way of thinking, rather than direct reasoning or one-time answers. You might reason about it, you might break the problem down into multiple steps, you might generate multiple ideas and evaluate your AI system to see which of the ideas you generated is the best, maybe it solves the problem step-by-step, and so on.

So now, Test Time Scaling has proven to be very effective. You're witnessing the evolution of this series of techniques and the emergence of all this Scaling law as we see the incredible achievements from ChatGPT to o1, to o3, and now Gemini Pro, all of these systems have gone through the journey from pre-training to post-training to test time scaling.

The amount of computing power that we need is, of course, staggering, and indeed, we want society to be able to scale up computing to produce more and more novel and better intelligence. Intelligence is of course the most valuable asset we have, and it can be applied to solve many very challenging problems. So.Scaling law is driving huge demand for Nvidia computing, and also for incredible chips like Blackwell.

Blackwell quadruples performance per watt over previous generation

Let's take a look at Blackwell, which is currently in full production and looks incredible.

First of all, every cloud service provider has systems running right now. We have systems here from about 15 computer manufacturers that are producing about 200 different stock keeping units (SKUS), 200 different configurations.

They include liquid-cooled, air-cooled, x86 architectures as well as NVIDIA Grace CPU versions, NVLink 36 x 2, 72 x 1, and many other different types of systems so that we can meet the needs of virtually any data center in the world. These systems are currently being produced in 45 facilities. This tells us how pervasive AI is and how quickly the industry as a whole is investing in this new model of computing.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

The reason we're pushing so hard is that we need more computing power, that's very clear. the GB200 NVLink72, which weighs 1.5 tons and contains 600,000 components. It has a backbone behind it that connects all these GPUs together with two miles of copper and 5,000 cables.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

The system is manufactured in 45 factories around the world. We built them, liquid-cooled them, tested them, disassembled them, shipped them in parts to the data center, and because it weighs 1.5 tons, we reassembled it and installed it outside the data center.

The manufacturing process is crazy, but the goal of all of this is because Scaling law is pushing computing power to this level of computing with Blackwell.

Blackwell delivers four times more performance per watt and three times more performance per dollar than our previous generation.This basically means that in one generation weReduces the cost of training these models by a factor of three, or if you want to triple the size of the model, the cost is roughly the same.But the important thing is that these tokens being generated are being used by all of us, applied to ChatGPT or Gemini and our cell phones.

In the future, almost all of these applications will consume these AI tokens, which are generated by these systems. Every data center is constrained by power.

Therefore.If Blackwell's performance per watt is four times that of our previous generation, then the revenue that can be generated, the amount of business that can be generated in the data center, quadruples.So these AI factory systems are actually factories today.

Now, the goal of all this is to create a giant chip. The amount of computing power that we need is pretty staggering, and this is basically a giant chip. If we had to build it as a chip, obviously it would be the size of a wafer, but that doesn't include the impact of the YIELD, which would probably need to be three to four times the size.

But we basically have 72 Blackwell GPUs here or 144 chips.A chip with an AI floating point performance of 1.4 ExaFLOPS, the world's largest supercomputer, the fastest supercomputer, only recently reached over 1 ExaFLOPS. It has 14 TB of RAM, and the memory bandwidth is 1.2 petabytes per second, equivalent to the entire internet traffic currently occurring. The world's Internet traffic is being processed through these chips.

In total, we have 130 trillion transistors, 2,592 CPU cores, and tons of networking. So I wish I could do this, but I don't think I can. So these are Blackwell, these are our Connect X network chips, these are NV Link. we tried to pretend to be the backbone of NV Link, but that's just not possible.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

These are HBM (high bandwidth memory), 14 terabytes of HBM memory, and that's what we're trying to do. That's the miracle of the Blackwell system.The Blackwell chip right here is the largest single chip in the world.

We need a lot of computational resources because we want to train larger and larger models.

In the past, there was only one of these reasonings, but in the future, AI will talk to itself, it will think and process internally.Currently, when tokens are generated at a rate of 20 or 30 per second, this is the limit of human reading. However, future GPT-o1, Gemini Pro, and new GPT-o1, o3 models will talk to themselves and reflect.

Therefore, it is conceivable that the token generation rate will be extremely high.To ensure great quality of service, low cost to customers, and to drive continued AI expansion, we need to dramatically increase token generation rates while reducing costs. That's one of the fundamental reasons we created NV link.

Nvidia creates three tools to help ecosystems build AI agents : Nvidia NIMS, Nvidia NeMo, open source blueprints

One of the major changes taking place in the enterprise world is the "AI agent".

An AI agent is a perfect example for testing time scaling. It is a type of AI, a system of models, some of which are responsible for understanding and interacting with customers and users, while others are responsible for retrieving information from storage, such as semantic AI systems.

It may access the internet or open a PDF file, or it may use tools such as a calculator, or even utilize generative AI to generate charts and so on. And it's iterative, it will gradually break down the problem you're asking and process it through different models.

To be more responsive to customers in the future, let AI respond. In the past, a question was asked and then the answer spewed out. In the future, if you ask a question, a whole bunch of models will be running in the background, so testing time scales, the amount of computation required for reasoning will spike, and we'll hopefully get better quality answers.

To help the industry build AI agents, our go-to-market strategy is not to go directly to enterprise customers, but rather to work with software developers in the IT ecosystem to integrate our technologies to enable new capabilities, as we have done with the CUDA library.Just as computational models in the past have had APIs for computer graphics, linear algebra, or fluid dynamics, the future will see the introduction of AI libraries on top of these CUDA-accelerated libraries.

Our three tools for helping the ecosystem build AI agents: the Nvidia NIMS, which is essentially packaged AI microservices. It takes all the complex CUDA software, CUDA DNN, Cutlass, Tensor RTLM, or Triton, and the complex software and models themselves are packaged, optimized, and put into a container that you can use as you please.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

As a result, we have models for vision, language understanding, speech, animation, and digital biology, with some new and exciting physical AI models coming soon. These AI models can run in every cloud platform because NVIDIA GPUs are now available in every cloud platform, and in original equipment manufacturers (OEMs) as well.

So you can integrate these models into your software packages, create an AI agent ServiceNow or SAP agent that runs on Cadence, and you can deploy it to your customers and run it wherever they want to run their software.

The next tool is a system we call Nvidia NeMo(math.) genusEssentially a digital employee onboarding and assessment system.

In the future, these AI agents will become a digital workforce working side-by-side with your employees to accomplish a variety of tasks for you.So bringing these specialized agents into your company is like onboarding your employees. We have different libraries to help train these AI agents for company-specific languages, perhaps with vocabularies that are unique to the company, with different business processes and ways of working.

Therefore, you need to give them examples to illustrate the criteria for the work product, and they will try to generate results that meet the criteria, while you give feedback and evaluate, and so on.

At the same time, you will set boundaries that make it clear what they are not allowed to do and what they are not allowed to say. We'll even give them access to certain information. Hence, the whole digital employee pipeline is called NeMo.

In the future, every company's IT department will transform into a human resource management department for AI agents. Whereas today they manage and maintain a range of software from the IT industry, in the future they will be responsible for maintaining, nurturing, guiding and improving a whole suite of digital agents and making them available to the company. Your IT department will gradually evolve into a human resource management department for AI agents.

In addition.We also provide a whole bunch of blueprints for our ecosystem to utilize, all of which are completely open source, and you're free to modify those blueprints.We have blueprints for all different types of agents.

Today we also announced a very cool and smart initiative: the launch of a family of LLAMA-based models, the NVIDIA LLAMA Nemotron Language Base Model, of which LLAMA 3.1 is a notable achievement. LLAMA 3.1 has been downloaded from Meta 650,000 times, and it has been derived and transformed into about 60,000 different models, theThe main reason why companies in almost every industry are starting to focus on AI.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

We realize thatLLAMA model can be better fine-tuned to fit the needs of the organization, so we've fine-tuned it using our expertise and capabilities to form the LLAMA Nemotron open source model suite. Some of these models are very small models with extremely fast response times, very small, we call them Super LLAMA Nemutron Super Models, and they are essentially mainstream models.

The mega-model can be used as a teacher model for other models, and can be a reward model evaluator, a judge, for assessing the quality of answers from other models and providing feedback.It can be distilled in a variety of ways, both as a teacher model and as a knowledge distillation model, with power and wide availability, and these models are now available online. They top the chat, command, and retrieval charts, with a wide range of features required for AI agents.

We are also working with the ecosystem where all NVIDIA AI technologies are already deeply integrated with the IT industry. We have terrific partners, including ServiceNow, SAP, Siemens, and others, that are doing excellent work on industrial AI, and Cadence and Synopsys are also doing excellent work. I'm proud to be working with Perplexity, who have revolutionized the search experience with terrific results.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Codium will be the next huge AI application for every software engineer in the world, and software coding is the next big service.There are 30 million software engineers in the world, and each one will have a software assistant to help them code; otherwise, they will be much less productive and the quality of the code they write will decline.

Thus, the huge figure of 30 million is involved, out of a total of 1 billion knowledge workers globally.It is clear that AI agents are likely to be the next robotics industry, promising trillions of dollars in business opportunities in the future.

Next, I'll show some of the blueprints we've created with our partners and the results of our work. These AI agents are the new digital workforce that are working for us and collaborating with us.An AI is a system of models that can reason around a specific task, decompose the task and retrieve data or use tools to generate a high quality response.

(Demonstration video)

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

commander-in-chief (military)AI transforms into an all-encompassing AI assistant

Okay, let's move on to AI.

AI is born in the cloud, it's a great experience in the cloud, and it's fun to use it on your phone. Soon, we'll have continuous AI that's always with us... Imagine putting on your Meta glasses and being able to ask for information just by pointing at or looking at something... Wouldn't that be cool?

The AI experience in the cloud is great, but our ambition goes beyond that to make AI ubiquitous. As mentioned earlier, NVIDIA AI can be easily deployed to any cloud, and can be cleverly fitted into in-house systems, but what our heart desires most is to have it securely fitted into a personal computer.

As we all know, Windows 95 revolutionized the computer industry, bringing a range of novel multimedia services and rewriting the way applications are developed forever. But Windows 95's computing model is still quite limited and less than perfect for AI.

We're looking forward to a future where AI in PCs is a powerful assistant to everyone, adding generative APIs to our existing 3D, sound, and video APIs to produce stunning 3D content, dynamic language, and beautiful sounds. We'll have to craft a new system that leverages the huge upfront investment in the cloud while making this vision a reality.

It's impossible for the world to create another way to program AI, so it would be great if we could turn Windows PCs into world-class AI PCs. And the answer is Windows WSL 2.

Windows WSL 2 is essentially two operating systems cleverly nested within one, and it's tailored to give developers direct and fast access to hardware.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

It is deeply optimized for cloud-native applications, with a focus on CUDA, and is truly ready to use out of the box.As long as computer performance keeps up, whether it's a visual model, a language model or a speech model, or a creative animation, a lifelikedigital personModels and so on, all kinds of models work perfectly on PCs, download them and start a marvelous journey with one click.

Our goal is to make Windows WSL 2 Windows PC a best-in-class platform that we will support and maintain for a long time.

Next, let me show you an example of a blueprint we just developed:

(Demonstration video)

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型NVIDIA AI is coming to hundreds of millions of Windows PCs around the world, and we're already working closely with the world's top PC OEMs to get them all ready for the AI era, so AI PCs will soon be in the homes of millions of people, making them better for life.

 NVIDIA Cosmos, the world's first fundamental model designed to understand the physical world.

Next, let's focus on the frontier of physics AI.

Speaking of Linux, let's talk about physical AI. Imagine a large language model that takes the context and prompts on the left, generates tokens one by one, and then outputs the result. The model in the middle is extremely large, with billions of parameters, and the context length is quite significant, as the user may load several PDF files in one go, which are cleverly converted into tokens.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Transformer's attention mechanism allows each token to establish associations with other tokens, and if there are hundreds of thousands of tokens, the computation grows quadratically.

The model processes all the parameters, input sequences, through each layer of the Transformer and generates a token, which is why we need the arithmetic power of something like Blackwell before we can generate the next token.This is what makes the Transformer model so efficient and computationally resource intensive.

If you replace the PDF with the surroundings, and the question with a request, such as "go over there and bring that box", the output would no longer be a token, but an action command, which makes perfect sense for the future of robotics, and the relevant technology is close at hand. But we've got to create a valid model of the world thatDistinguish between language models like GPT.

This world model has to understand the rules of the real world, such as gravity, friction, inertia, the physical dynamics of the world, but also geometry and spatial relationships, cause and effect. What happens when something falls on the floor, poke it and it will fall over, one has to understand Object permanence, a ball rolls across the kitchen counter and falls down the other side, it doesn't disappear into another quantum universe, it's still there.

Most models today still struggle with understanding this type of intuitive knowledge, so we want to build a world base model.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Today, we have a big announcement to make-- NVIDIA Cosmos, the world's first world base model built for understanding the physical world.Seeing is believing, come and see.

(Show video)

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型NVIDIA Cosmos, the world's first world base model, is trained on 20 million hours of video data that focuses on dynamic physical things, like natural themes, human walking, hand movements, manipulating objects, and fast camera movements, with the goal of teaching AI to understand the physical world, not to generate creative content. With physics AI, it's possible to do a lot of downstream applications.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

We can use it to do synthetic data generation to train the model, refine the model, initially build the robot model, and generate multiple physically based, physically logical future scenarios, just like Dr. Strange manipulates time, because the model understands the physical world.

You've also seen a bunch of images generated, it can also add subtitles to videos, he can take videos and subtitle them, and those subtitles and videos can be used to train multimodal big language models. So, being able to train robots and big language models with this base model.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

The platform has autoregressive models for real-time applications, diffusion models for generating high-quality images, a super-awesome word splitter that learns real-world "vocabularies", and a data pipeline. If you want to train your own models with this data, we've accelerated it from start to finish due to the sheer volume of data.

The Cosmos platform's data processing pipeline is accelerated by CUDA and AI.

Today, we're announcing the Cosmos open source license, which has been placed on GitHub, with models of different sizes, small, medium, and large, corresponding to the fast model, the mainstream model, and the teacher model, which is the knowledge transfer model.Hopefully Cosmos will give the robotics and industrial AI space the boost that Llama 3 did for enterprise AI.

Physical AI will revolutionize the $50 trillion manufacturing and logistics industry

When connecting Cosmos to Omniverse, magic happens.

The fundamental reason is that Omniverse is a simulator based on a system built on algorithmic physics, principle physics, and simulation. Connecting it to Cosmos provides the baseline facts for Cosmos to generate content, control, and regulate the results.

In this way, Cosmos output is based on real situations, in the same way that connecting a large language model to a retrieval augmentation generation system is going to allow AI generation based on real benchmarks. The combination of the two makes for a physics-simulated, physics-based multiverse generator with super exciting application scenarios and even clearer for robotics and industrial applications.

Cosmos plus Omniverse, plus the computers that train AI, represent the three types of computers necessary to build robotic systems.

Every robotics company will eventually need three computers: a DGX computer for training AI; and an AGX computer for deploying AI in a variety of edge devices, such as automobiles, robots, and automated mobile robots (AMRs), for autonomous operation.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Connecting the two requires a digital twin, which is the very foundation of all analog.

The digital twin is the place where the trained AI performs operations such as practice, improvement, synthetic data generation, reinforcement learning, and AI feedback, thus it is the digital twin of AI.

These three computers will work interactively, and this three-computer system is NVIDIA's strategy for the industrial world, which we've been discussing for a while. It's not so much a "three-body problem" as it is a "three-body computer solution", it's the NVIDIA of robotics.

Here are three examples.

The first example is the digitization of industry. Millions of factories and hundreds of thousands of warehouses around the world, which form the backbone of the $50 trillion manufacturing industry, will all have to be software-defined, automated and incorporate robotics in the future.

We have partnered with Kion, the world's leading provider of warehouse automation solutions, and Accenture, the world's largest professional services provider, to focus on digital manufacturing and create special programs together, take a look.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Our marketing strategy is the same as any other software and technology platform, with the help of developers and ecological partners.More and more eco-partners are connecting to Omniverse because everyone wants to digitize the industries of the future, and there's so much waste and automation opportunity in that $50 trillion in global GDP.

(Show video)

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型In the future, everything will be simulated. Each factory will have digital twins that generate a bunch of future scenarios using Omniverse and Cosmos, and the AI picks out the optimal scenarios that will become the AI programming constraints for deploying to real factories.

Next Generation Automotive Processors --Thor

The second example is self-driving cars.

After years of development and the success of Waymo and Tesla, the self-driving revolution is upon us.

We offer three types of computers for this industry:systems for training AI, the simulation and synthetic data generation systems Omniverse and Cosmos, and the computers in the car.Each automotive company may work with us differently and may use one, two or three computers.

Almost every major automotive company in the world has partnered with us in one way or another to use one, two, or three of these three types of computers, with companies like Waymo, Zoox, Tesla, and BYD -- the world's largest new energy vehicle company, Jaguar Land Rover with its super-cool new cars, and Mercedes-Benz, which this year is starting to mass-produce a bunch of cars powered by NVIDIA technology. of cars with NVIDIA technology.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

We're particularly pleased to announce today that Toyota and NVIDIA have entered into a partnership to build the next generation of self-driving cars. Along with Lucid, Rivian, Xiaomi, Volvo and many more.

Tucson Future is building trucks with self-awareness, and this week announced that Aurora is going to build self-driving trucks with NVIDIA technology.

With 100 million vehicles produced globally every year, and billions of vehicles on the road, traveling trillions of miles every year, all of which will be highly or fully autonomous in the future, this is going to be a mega-industry. Just looking at the cars that are already on the road, we're already generating $4 billion in revenue from this business, and we're expecting to get to $5 billion this year, so the potential is huge.

Today, we are releasing our next generation automotive processor, Thor.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

This is Thor, the robotic computer that processes massive amounts of sensor information, a stream of data from countless cameras, high-resolution radar, and LIDAR, which it has to convert into tokens to feed into the Transformer, which predicts the next path of travel.

Thor is now in full production and has 20 times the processing power of its predecessor, Oren, which is the standard for today's self-driving vehicles.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Thor is not only used in automobiles, but also in complete robots, such as AMR (Autonomous Mobile Robots), or humanoid robots, acting as their brains, manipulators, and general-purpose robotics computers.

I'm also particularly proud to announce that our Safety Drive OS is now the first software-defined programmable AI computer to be certified to ASIL D, the highest automotive functional safety standard, a remarkable achievement that gives CUDA functional safety. If you're building robots with NVIDIA CUDA, you're good to go.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Here's showing you how to do things in a self-driving scenario with Omniverse and Cosmos. Today, we're not just going to show you videos of cars running on the road, we're also going to show you how to use AI to automatically rebuild a digital twin of a car, and use that ability to train future AI models, so check it out.

(Show video)

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型Isn't that incredible?

Thousands of drives can turn into billions of miles of data. While there is still a need for actual vehicles on the road to collect data on an ongoing basis, the ability to generate synthetic data using this physically-based, reality-fitting multiverse provides self-driving AI with massive amounts of accurate and sensible data to use for training.

The autonomous driving industry is gaining momentum, and it is incredibly exciting to see how the pace of autonomous driving development will increase dramatically over the next few years, just as computer graphics technology is changing by leaps and bounds.

Universal Robotics "ChatGPT moment" be almost within reach

Talk more about humanoid robots.

The "ChatGPT moment" in general-purpose robotics is close at hand, and the enabling technologies I've talked about will lead to fast and amazing breakthroughs in general-purpose robotics over the next few years.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

General-purpose robots are important because robots with tracks and wheels require special environmental adaptations, while there are three types of robots that don't require special venues and fit perfectly into our existing world, making them ideal.

The first category is embodied intelligence robots, with embodied intelligence, as long as the office computer arithmetic enough, this kind of information worker robots can make a big difference.

The second category is self-driving cars; after all, we've spent over a hundred years building roads and cities.

The third category is humanoid robots, if we overcome these three types of robotics-related technology, it will become the world's largest technology industry ever, so the era of robotics is coming soon.

The key is how to train these robots. It is difficult for humanoid robots to capture mimicry information; we generate driving data all the time when driving, but it is laborious and time-consuming for humanoid robots to capture human demonstrations.

So, we had to come up with a clever way to use AI and Omniverse to synthesize hundreds of human demonstrations into millions of simulated actions from which the AI could learn how to perform the task, and we show you exactly how to do that below.

Developers around the world are building the next generation of physical AI, also known as embodied robots and humanoid robots. Developing general-purpose robot models requires massive amounts of real-world data, which is expensive to collect and organize.The NVIDIA Isaac Groot platform was created to provide developers with four key tools: a basic robot model, a data pipeline, a simulation framework, and the Thor robot computer.

NVIDIA Isaac Groot's Synthetic Motion Generation Blueprint, a set of simulation workflows for imitation learning, allows developers to generate exponentially-sized large data sets with a small number of human demonstrations.

First, with Gro Teleop, skilled workers can access the robot's digital twin with Apple Vision Pro.

This means that even without a physical robot, an operator can capture data and also manipulate the robot in a risk-free environment, avoiding physical damage or wear and tear. To teach a robot a task, the operator captures trajectories of movements through several telemanipulation demonstrations, and then uses Gro Mimic to expand these trajectories into a larger dataset.

Next, domain randomization and 3D-to-real scenario scaling are performed to generate datasets of exponentially increasing size using the Omniverse and Cosmos-based Gro Gen tools.The multiverse simulation engines of Omniverse and Cosmos provide massive datasets for training robot strategies. Once the strategies are trained, developers perform software in-loop testing and validation in Isaac Sim before deploying to real robots.

Powered by NVIDIA's Isaac Groot, the age of universal robotics is upon us.

We'll have tons of data for robot training. The NVIDIA Isaac Groot platform provides the robotics industry with key technology elements to accelerate the development of general-purpose robots.

AI supercomputer goes to the desktop

There's another project that I have to introduce to you. None of this would be possible without this super awesome project that was started ten years ago, which is called Project Digits - Deep Learning GPU Intelligent Training System - within the company.

Before launching, I streamlined the DGX to make it adaptable to RTX AGX, OVC, and the rest of the company's products, and the DGX 1 was born to revolutionize the AI space.

In the past, to build a supercomputer, you had to build your own facilities and infrastructure, which was a huge undertaking. We built DGX 1 to give researchers and startups an AI supercomputer right out of the box.

In 2016, I delivered the first DGX 1 to a startup called OpenAI, and Elon Musk, Ilya Sutzkewer, and so many other engineers were there to celebrate its arrival.

Clearly, it transformed the field of artificial intelligence and computing. But today AI is everywhere, not just in research organizations and startup labs. As mentioned at the beginning, AI has become the new way of computing, of building software, and every software engineer, creative artist, anyone who uses a computer as a tool needs an AI supercomputer.

I've always wished the DGX 1 was a little smaller. Imagine that, ladies and gentlemen.

This is NVIDIA's latest AI supercomputer, now called Project Digits.If you have a better name, feel free to let us know.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

What's awesome is that it's an AI supercomputer that runs the entire NVIDIA AI stack, all of NVIDIA's software can run on it, and the DGX cloud can be deployed, put it wherever it wants to be, wirelessly connected, and can also be used as a workstation, remotely accessed like a cloud supercomputer,...NVIDIA AI can run them all.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

It's based on an ultra-mysterious chip, GB110, our smallest Grace Blackwell chip, show everyone inside.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

Isn't it super cute?

This chip is in production. This highly classified chip was built in collaboration with Mediate, a leading global system-on-chip (SOC) company, and connects the CPU to NVIDIA's GPU via a chip-to-chip Mv link.It's expected to be available around May, so it's so exciting.

黄仁勋CES 2025演讲全文:全球最快GeForce GPU、3000美元个人AI超算、世界基础模型

It probably looks like this. If you're using a PC or Mac, it doesn't matter, it's a cloud platform that can sit on your desk or be used as a Linux workstation. If you want more, connect them with Connect.X, bring multiple GPUs, and out of the box, you have a full supercomputing stack. This is NVIDIA Project Digits.

I just said that.We have three new Blackwell products in production, not only the Grace Blackwell supercomputer and the nvlink 72 system in global production, but also three new Blackwell systems.

A stunning AI base world model, the world's first physical AI base model is open-sourced, activating industries such as robotics globally; and three types of robots, humanoid robots based on embodied intelligence, and self-driving cars, are all making an impact. It has been a fruitful year. Thank you for your cooperation, thank you for being here, and I made a short video looking back at last year and looking forward to the coming year, play it.

© Copyright notes

Related posts

No comments

none
No comments...