Jen-Hsun Huang's latest 10,000-word interview: AGI is coming, AI will revolutionize productivity

Character7mos agoupdate AiFun
332 0

On October 4, NVIDIA CEOHuang Jen-hsun (1944-), Chinese-American physicistGuest on the talk show Bg2 Pod for a wide-ranging conversation with hosts Brad Gerstne and Clark Tang.

They focused on topics such as how to scale intelligence to AGI, NVIDIA's competitive advantage, the importance of inference and training, future market dynamics in the AI space, the impact of AI on various industries, Elon's Memphis Supercluster and X.ai, OpenAI, and more.

Jen-Hsun Huang emphasized the rapid evolution of AI technology, especially the breakthroughs on the path to general artificial intelligence (AGI). He stated.AGI Assistants are coming in some form soon and will get better over time.

Jen-Hsun Huang also shared NVIDIA's leadership in the computing revolution, noting that by lowering the cost of computing and innovating hardware architectures, NVIDIA has taken a significant advantage in driving machine learning and AI applications.He specifically mentioned NVIDIA's "moat", a decade-long ecosystem of hardware and software that makes it difficult for competitors to surpass with a single chip improvement.

In addition, Jen-Hsun Huang praisedxAI and Musk's team build 100,000 GPU Memphis supercluster in just 19 days, calling it an "unprecedented" achievement. This cluster is undoubtedly one of the fastest supercomputers in the world and will play an important role in AI reasoning and training tasks.

Talking about the impact of AI on productivity, Jen-Hsun Huang was optimistic that AI will greatly improve the efficiency of enterprises and bring more growth opportunities, and will not lead to mass unemployment. At the same time, he also called on the industry to strengthen its focus on AI security to ensure that the development and use of the technology benefits society.

The key points of the full text are summarized below:

  • (AGI Assistant) in some form soon!...... At first it will be very useful, but not perfect. Then over time it will become more and more perfect.
  • We have reduced the marginal cost of computing by a factor of 100,000 in 10 years.Our entire stack is growing, and our entire stack is innovating.
  • People think the reason for designing a better chip is that it has more triggers, more bits and bits ......But machine learning isn't just about software; it's about the entire data pipeline.
  • It's the machine learning flywheel that counts.You have to think about how to make this flywheel faster.
  • Simply having a powerful GPU does not guarantee a company's success in AI.
  • MuskUnique understanding of engineering and building large systems and resource provisioning... 100,000 GPUs as a cluster... in 19 days.
  • AI won't change every job.But it will have a huge impact on the way people work.When companies use AI to improve productivity, it usually shows up as better earnings or growth.

The Evolution of AGI and AI Assistants

Brad Gerstner:

This year's theme is Extending Intelligence to AGI. when we did this two years ago, we did it in the age of AI, and that was two months before ChatGPT, which is incredible considering all these changes. So I think we can start with a thought experiment and a prediction.

If I think of AGI colloquially as a personal assistant in my pocket, if I think of AGI as that spoken assistant, I'm used to it. It knows everything about me. It has a perfect memory of me and can communicate with me. They can book a hotel for me or make a doctor's appointment for me. Look at the speed of change in today's world, when do you think we will have personal assistants?

Jen-Hsun Huang:

Soon it will be in some form or another. And this assistant will get better and better as time goes on. That's the wonderful technology we know. So I think at first it's going to be very useful, but not perfect. And then over time it will become more and more perfect. Like all technology.

Brad Gerstner:

When we look at the pace of change, I think Musk said that the only thing that really matters is the pace of change. We do feel like the pace of change has dramatically accelerated, and it's the fastest pace of change we've ever seen on these issues because we've been poking around in AI for a decade, if not longer. Is this the fastest rate of change you've seen in your career?

Jen-Hsun Huang:

This is because we reinvented computing. A lot of this happened because we reduced the marginal cost of computing by a factor of 100,000 in 10 years. Moore's Law should be about 100 times that. We achieved this in a number of ways. First, we introduced accelerated computing, putting less efficient work on the CPU on the GPU. We achieved this by inventing new numerical precision. We did this by inventing new architectures, inventing tensor cores, building MV Link in a systematic way, as well as very, very fast memory, and scaling and working across the stack using MV Link. Basically, everything I've described about how NVIDIA does things has led to a rate of innovation that is beyond Moore's Law.

Now what's really amazing is that since then, we've moved from manual programming to machine learning. The amazing thing about machine learning is that machine learning can be learned very quickly. That's proven to be true. So when we reformulated the way we allocate computation, we did a lot of, various kinds of parallelism. Tensor parallelism, all kinds of pipeline parallelism. We're good at inventing new algorithms and new training methods on top of that, and all of these techniques, all of these inventions are the result of stacking on top of each other.

In retrospect, if you look at how Moore's Law works, software is static. It was pre-compiled, like a shrink raft put into a store. It was static, and the hardware underneath grew at the rate of Moore's Law. Now, our whole stack is growing, and the whole stack is innovating. So I think now we're suddenly seeing scaling.

That's certainly remarkable. But we used to talk about pre-training models and scaling at that level and how we doubled the size of the model and therefore doubled the data size accordingly. As a result, the amount of computing power required quadruples every year. That's a big thing. But now we're seeing scaling for post-training, we're seeing scaling for inference. So people used to think that pretraining was hard and inference was easy. Now everything is hard. That makes sense, but the idea that all human thinking is disposable is kind of ridiculous. So there has to be a concept of fast thinking, slow thinking, reasoning, reflection, iteration and simulation. And now it's emerging.

NVIDIA's competitive moat

Clark Tang:

I think one of the most misunderstood things about NVIDIA is how deep the true NVIDIA model goes. I think there's a perception that if someone invents a better chip, they've won. But the truth is, you spend a decade building the full stack from GPU to CPU to networking, and especially the software and libraries that support the applications. It runs on NVIDIA. So I think you talked about that, but when you think about the moat that NVIDIA has today, do you think the video model is bigger or smaller today than it was three or four years ago?

Jen-Hsun Huang:

Well, I appreciate that you recognize how computing has changed. In fact, it was thought (and many people still think) that the reason a better chip was designed was because it had more triggers, more bits and bits. Do you see what I mean? You'll see the slides from their keynote. It has all these triggers and bar graphs and that sort of thing. These are great. I mean, look, horsepower does matter. It does. So these things fundamentally matter.

Unfortunately, however, it's all about ideas. It's all ideas in the sense that the software is some application running on Windows and the software is static, right? That means the best way to improve the system is to build faster and faster boats. But we realize that machine learning is not human programming. Machine learning isn't just about software, it's about the entire data pipeline. In fact, it's the flywheel of machine learning that matters. So how do you see me enabling this flywheel? On the one hand, it's about enabling data scientists and researchers to work efficiently in this flywheel, which has been in place since the beginning. A lot of people don't even realize that they need AI to manage the data to teach AI. and AI itself is pretty complex.

Brad Gerstner:

Is AI itself improving? Is it also accelerating? Again, when we think about competitive advantage, yes, yes. It's a combination of all of those.

Jen-Hsun Huang:

Exactly, it's the availability of smarter AI to manage data that has led to this. We even now have synthetic data generation and all sorts of different ways of managing data and presenting data to it. So you're involved in a lot of data handling before you're even trained. So people are like, oh.Pytorch, this is the beginning of the world and the end of the world. It's very important.

But don't forget, before and after Pytorch, the flywheel is about the way you have to think about it, how I should think about the whole flywheel, how I should design a computational system, a computational architecture, that will help you to utilize that flywheel and make it as efficient as possible. It's not the size of an app training. Does that make sense? It's just one step. Okay. Every step on the flywheel is hard. So the first thing you should do is not think about how to make Excel faster, how to make doom faster, that's in the past, isn't it? Now you have to think about how do you make this flywheel faster? There are a lot of different steps in this flywheel, and machine learning is not easy, as you all know.

What the OpenAI or X or Gemini teams do is not easy, and they think deeply about us. I mean, what they do is not easy. So we decided, look, this is what you should be thinking about. It's the whole process, and you want to accelerate every part of it. You want to respect Doles' Law, and Doles' Law suggests that if this is 301 TP4T of time, and I've accelerated it three times, then I haven't really accelerated the whole process. Does that make sense? You really want to create a system that accelerates every step of the way, because only by doing the whole thing can you really materially improve the cycle time and the flywheel, which is the learning rate, which is ultimately what leads to exponential growth.

So, what I'm trying to say is that our view of what the company is really doing will be reflected in the product. Notice I keep talking about this flywheel, the whole site. Yeah, that's right. We accelerate everything.

Right now, the main focus is on video. A lot of people are focused on physical AI and video processing. Imagine the front end. There are terabytes of data coming into the system every second. As an example, a pipeline will receive all this data. It would first be ready for training. Yes, so that the whole process can be accelerated.

Clark Tang:

Today people only think about text models. Yes, but the future is that this video model, using some textual model like o1, really handles a lot of data before we get there.

Jen-Hsun Huang:

Yes. So language modeling would be involved in everything. But we, the industry has spent a tremendous amount of technology and effort to train language models, to train these large language models. Now, we use large language forms at every step of the way. That's pretty remarkable.

Brad Gerstner:

What I hear you saying is that in a combined system, yes, the advantage grows over time. So I hear you saying that we have a greater advantage today than we did three to four years ago because we're improving each component. That's the combination, and when you think about, for example, as a business case study, Intel, relative to where you are now, it has the dominant model, the dominant position in the stack. Maybe, to summarize it a little bit more briefly, compare your competitive advantage to the competitive advantage that they had at the peak of their cycle.

Jen-Hsun Huang:

Intel is different because they were probably the first company to excel in manufacturing process engineering and manufacturing. Manufacturing as stated above is making chips. Designing chips, building them in x86 architectures, and making faster and faster x86 chips is where their talent lies, and they blend it with manufacturing.

Our company is a little different, and we recognize that, in fact, parallel processing does not require every transistor to perform well, serial processing requires every transistor to perform well. Parallel processing requires a lot of transistors to be more cost effective. I'd rather have 10 times more transistors and be slower 20%. then 10 times fewer transistors and be faster 20%. does that make sense? They want the opposite. Thus, single-threaded performance, single-threaded processing, and parallel processing are very different. So we observe that, in fact, our world is not getting better as we go along. We want to do very well, as well as we can, but our world is really getting better.

Parallel computing, parallel processing is hard because each algorithm requires a different way of refactoring and rebuilding the architecture of the algorithm. What people don't realize is that you can have 3 different CPUs. they all have their own C compilers. You can compile software to that axis.

This is not possible in accelerated computing. The company that comes up with the architecture has to come up with its own Open GL. so we revolutionized deep learning because we have a domain-specific library called cuDNN (deep neural network library), a domain-specific library called optical. We have a domain specific library called cuQuantum.

Brad Gerstner:

For the industry-specific algorithms located below, you know, the Pytorch layer that everyone focuses on. Like I hear all the time.

Jen-Hsun Huang:

If we don't invent it, none of the apps on it will work. Do you guys understand what I'm saying?So it's the algorithms that NVIDIA is really good at. The propagation of science to science on top of the underlying architecture, that's what we're really good at.

NVIDIA is building a complete AI computing platform, including hardware, software and an ecosystem

Clark Tang:

All the attention is now focused on reasoning. But I remember, two years ago, I asked you a question when I had dinner with Brad, do you think your moat will be as strong on the reasoning side as it is on the training side?

Jen-Hsun Huang:

I'm not sure I said it would be stronger.

Clark Tang:

You just mentioned a lot of these elements, the combinability between the two or, we don't know, the overall combination. It's very important for clients to be able to maintain flexibility between the two. But, since we're in the age of reasoning now, can you talk about that?

Jen-Hsun Huang:

Reasoning training is reasoning on that scale. I mean, you're right. So if you train it properly, then there's a good chance that you're going to reason properly, and if you build it on this architecture without any thought, it's going to run on this architecture. Well, you can still go ahead and optimize it for other architectures, but at least, because it's been built on NVIDIA, it will run on NVIDIA.

Now, the other aspect of course is just kind of the capital investment aspect, which is that when you train a new model, you want to train it on your best new equipment. That's going to leave behind the devices that you used yesterday that were perfect for inference. So there's a range of free devices that are compatible behind the new infrastructure. So we're very strict about making sure that we're always compatible so that everything we leave behind will continue to be excellent.

Now, we also put a lot of effort into constantly reinventing new algorithms so that when the time comes, the Hopper architecture is twice, three times, four times better than it was when they purchased it, so that this infrastructure continues to be really effective. So all the work that we do, improving new algorithms, new frameworks. Note that it helps every installed base that we have. hopper is better for it, Ampere is better for it, even Volta is better for it.

I think Sam Altman just told me that they just recently decommissioned OpenAI's Volta infrastructure. So I think we left a trail of this installed base, just like all compute installed bases are important. NVIDIA is involved in every cloud, both local and at the edge.

VILA's visual language models are created in the cloud without modification and work perfectly at the robot edge. They all have good compatibility. So I think architectural compatibility is very important for large devices, as well as for the iPhone and other devices. I think the installation base is very important for reasoning.

Jen-Hsun Huang:

But what really benefited me was the fact that we were trying to train these large language models in new architectures. We're able to think about how to create architectures that will perform well in reasoning when the time comes someday. So we've been thinking about iterative models for reasoning models and how to create very interactive reasoning experiences for that, right, your personal agent. You don't want to leave and think for a while after you finish talking. You want to interact with you very quickly. So how do we create something like that?

MVLink so that we can employ these systems that are perfect for training. But when you're done with it, the inference performance will be excellent. So you want to optimize this time to first Token. And time to first Token is actually very hard to do because time to first Token requires a lot of bandwidth. But if your context is also rich, then you need a lot of FLOPS. so you need an infinite amount of bandwidth and at the same time an infinite amount of FLOPS in order to achieve a response time of a few milliseconds. So this architecture is really hard to implement. We invented the great Blackwell MVLink for that.

Brad Gerstner:

Had dinner with Andy Jassy (President and CEO of Amazon) earlier this week and Andy said we have Tranium, Inferentia coming up. Again, I think most people are looking at these as NVIDIA issues. But the next thing he said was that NVIDIA is a key partner for us and will continue to be a key partner for us. As far as I can see, the world will rely on NVIDIA in the future.

So when you think about the custom ASICs that are being built, they're going to be used for the target application. Maybe it's Meta's inference gas pedal, maybe it's Amazon's training, or Google's TPU. and then you think about the supply shortages that you're facing today, do those factors change that dynamic? Or will they complement the systems they're buying from you?

Jen-Hsun Huang:

We're just doing different things. Yes, we're trying to accomplish different things. Now NVIDIA is trying to build a computing platform for this new world, this machine learning world, this generative AI world, this agent-based AI world. We're trying to create, and one of the things that's so profound in computing is that after 60 years of development, we've reinvented the entire computing stack. From programming to machine learning, from CPUs to GPUs, from software to AI, applications from software to AI. from software tools to AI. so every aspect of the compute stack and the technology stack has changed.

What we want to do is create a computing platform that is available everywhere.That's really the complexity of what we're doing, and the complexity of what we're doing is that if you think about what we're doing, you realize that we're building an entire AI infrastructure, and we think of it as a computer. As I said before, the data center is now the unit of computing. For me, when I think about computers, I'm not thinking about chips. I'm thinking about this thing. It's my thought model of all the software, all the orchestration, all the machines that are in there. This is my computer.

We try to build a new one every year. Yeah, it's crazy. No one's ever done that before. We try to build a brand new one every year. Every year, we deliver two to three times the performance. So every year, we reduce the cost two to three times. Every year, we improve energy efficiency two to three times. So, we ask our customers not to buy everything at once, but just a little bit each year, right? Right. The reason for this is that we want their costs to stay even in the future. Right now, everything is architecturally compatible, so it would be very difficult to build these things individually at the rate we're going.

Now, the doubly difficult part is that we accept all of that, rather than selling it as infrastructure or as a service, we don't agree with all of that. We integrate it into GCP, AWS, Azure, X. So everybody's integration is different. We have to integrate all of our architectural libraries, all of our algorithms, and all of our frameworks into their frameworks. We integrate our security system into their system, we integrate our network into their system, right? And then we basically do 10 integrations, and now we do that every year. That's the miracle.

Brad Gerstner:

We, I mean, you try to do this every year, which is crazy. So what drives you to do this every year?

Jen-Hsun Huang:

Yeah, that's when you break it down systematically. The more you break it down, the more each person breaks it down, the more surprised they are. Yes. How the entire electronics ecosystem today can commit to working with us to ultimately build a computer cube that integrates into all of these different ecosystems and coordinates so seamlessly. So clearly we're spreading APIs, methods, business processes, and design rules backward, and methods, architectures, and APIs forward.

Brad Gerstner:

That's what they were supposed to be.

Jen-Hsun Huang:

They've been at it for decades. Yes, and evolving as we do. But these APs have to be integrated.

Clark Tang:

All someone has to do is call the OpenAI API and it works. That's it.

Jen-Hsun Huang:

Yeah. Yeah, it's a little crazy. It's a whole. It's what we invented, this massive computing infrastructure that the whole planet is working with. It blends in everywhere. You can sell it through Dell, you can sell it through HP. It's hosted in the cloud. It's everywhere and nowhere. People are using it now in robotic systems, robots and human robots, they're in self-driving cars. They're all architecturally compatible. Pretty crazy.

Brad Gerstner:

This is crazy.

Jen-Hsun Huang:

I don't want you to get the impression that I didn't answer the question. In fact, I answered it. When we really layered the foundation, I mean the way of thinking. We're just doing some different things. Yes, as a company, we want to be informed, and I'm very knowledgeable about everything around the company and the ecosystem, right?

I know everyone else is doing something else with what they're doing. Sometimes that works against us, sometimes it doesn't. I'm very aware of that, but it doesn't change the goal of the company. Yes, the company's only goal is to build a platform architecture that can be used everywhere. That is our goal.

We're not trying to take any share from anyone. NVIDIA is a market maker, not a share taker. If you look at the slides that we don't show as a company, you'll see that this company doesn't talk about market share one day, internally. All we talk about is how do we create the next thing?

What is the next question we can address in this flywheel? How can we serve people better? How can we shorten a flywheel that used to take about a year to about a month? Yes. And what is the speed of light? Isn't it?

So we're thinking about all these different things, but one thing we don't, we don't, we know everything about everything, but we're sure that our mission is very unique. The only question is whether that mission is necessary. Does it make sense? All companies, all great companies should have this at their core. It's about what are you doing?

Of course. The only question is, is it necessary? Does it have value? Yes. Does it have impact? Does it help people? I'm sure you're a developer, you're a generative AI startup, and you're about to decide how you want to be a company.

One choice you don't have to make is which A6 I support?If you only support CUDA, you can go anywhere. You can always change your mind later. But we are the gateway to the AI world, aren't we?

Once you decide to join our platform, you can postpone all other decisions. You can always build your own foundation later. We are not against it. We don't get mad about it. When I work with all GCPs, GCP Azure, we show them our roadmap years in advance.

They didn't show us their basic roadmap, and that never offended us. Does that make sense? We create, we are in one. If you have a singular purpose, your goals make sense, and your mission is precious to you and others, then you can be transparent. Note that my roadmap is transparent on GTC. My roadmap goes deeper for our friends at Azure, AWS, and others. We have no problem doing any of these things, even if they are building their own assets.

Brad Gerstner:

I think when people look at the business, you recently said that the demand for Blackwell is crazy. You said that one of the hardest parts of the job is saying "no" to people with emotional tools when the world lacks the computing power that you can produce and deliver. But that's what the critics say. Wait a minute. They say it's like Cisco in 2000, where we overbuilt fiber. It's going to be boom and bust. I think about when we had dinner in early '23. at that dinner in January '23, NVIDIA's prediction was that revenue in 2023 would be $26 billion. You made it to $60 billion.

Jen-Hsun Huang:

Just let the facts come out. This is the biggest prediction failure in the history of the world. Right. We can at least admit it.

© Copyright notes

Related posts

No comments

none
No comments...