DeepSeek Storm: Low-cost, high-performance leads to change in the AI industry chain

trade3mos agoupdate AiFun
367 0
Article Summary:
DeepSeekChanging the AI industry chain with significant low-cost advantages
- 🚀 DeepSeek Launches Better-Performing Large Models at Less Than a Tenth of the Competitor's Costs
- 🌐 Global tech giants are deploying DeepSeek APIs in a low-cost AI wave
- 🔍 AI industry chain benefits, related companies actively adapt and innovate

On the night of New Year's Eve in the Year of the Snake, Yuan Jinhui, the founder of Silicon Flow, didn't stay at the dinner table, but was grasping the time to meet with the technical team in order to solve the problem of adapting the DeepSeek model on the domestic chip as soon as possible.

After three days and nights of overtime, they joined forces with Huawei and theDeepSeek, rushed to go live on Feb. 1 with a service based on a domestic chipDeepSeek-V3cap (a poem)DeepSeek-R1. At this time, less than a week has passed since DeepSeek AI intelligent assistant topped the App Store free app list in China and the United States at the same time.

Perhaps it is a coincidence that in the last three years, every year around the Spring Festival will catch a wave of AI boom. 2023, ChatGPT triggered the domestic big model startup fever. 2024, the sudden appearance of the video generation model Sora shocked everyone.

This year, the star of the show is DeepSeek, a technologically open but understated Chinese company that has single-handedly changed the wayAI macromodelof the world's patterns.

At the end of 2024 and the beginning of 2025, DeepSeek released the new generation of MoE model V3 and inference model R1 successively, with the two most intuitive performances of "high performance, low training cost" and "aligned with OpenAI o1", which detonated The company's performance has been recognized both at home and abroad. According to Xsignal's data, as of February 8, DeepSeek's daily activity on the domestic APP side was 34.94 million, and the daily activity on the overseas APP side reached 36.85 million, surpassing other similar apps in less than a month and catching up with ChatGPT.

Meta, the Silicon Valley tech giant that originally developed the Llama series of models, is the world leader in open source macromodeling, but now DeepSeek has made a better model at less than a tenth of the cost, and made it free and open source. Now, DeepSeek has made models with better performance at less than one-tenth of the cost, and has also made them free and open source, which puts Meta in an awkward position with its "huge costs".

随后受到冲击的是NVIDIA,。因为市场担忧DeepSeek带来的低成本模式会减少算力需求,英伟达市值一夜蒸发近6000亿美元,创下了美股单日最大跌幅纪录。随即,一众美股芯片股接连下跌。

On the other side of the impact, almost all enterprises related to the big model industry were busy joining this huge wave in the Spring Festival, eager to undertake the traffic dividends behind. Mainstream cloud vendors worldwide, without exception, quickly followed up the deployment of DeepSeek API; Huawei Rise outside a number of domestic AI chip vendors, announced the adaptation of DeepSeek different sizes of models.

The enthusiasm of domestic investors has also been instantly ignited. AI industry chain companies constitute DeepSeek concept stocks, including arithmetic infrastructure, large model technology and algorithms, application scenarios and other related companies, has been in the secondary market for many consecutive days of universal gains.

Outside of the spotlight, the big model "six tigers" are facing renewed industry skepticism. Model influence, technology recognition, product user scale, it looks like they are lagging behind in all these dimensions. But a year ago, the "Six Little Tigers" were the Chinese startups chosen by the capital to fight with OpenAI.

That's how broad DeepSeek's reach is with this one. It is both a threat to its rivals and an east wind for the industry. In any case, it has become the absolute protagonist of 2025.

"Mysterious forces from the East."

At this year's Davos Forum, many foreigners, regardless of whether they were in the tech world or not, asked Fusion Fund founding partner Lu Zhang about DeepSeek, a Chinese company they had just recently heard about.

Even within Fusion Fund, Zhang Lu's partner, David Gerster, who has more than 20 years of experience in AI and deep learning, has been asking her in the group: What do you know about this company? Do you know the founders of this company?

As if overnight, everyone wanted to understand the context in which DeepSeek came out of nowhere.

Unlike other big modeling companies, DeepSeek, founded in 2023, is offshoot of the well-known quantitative fund Phantom Square Quantitative, and has not raised any external financing. The company's founder, Liang Wenfeng, was born in 1985 in Zhanjiang, Guangdong Province, and graduated from Zhejiang University in 2010, after which he has been engaged in quantitative trading-related work and founded Hangzhou Mirage.

DeepSeek has also kept an unusually low profile since its inception, with almost no publicity. However, in the AI technology circle, since the middle of last year, the company has been recognized by the industry for DeepSeek-V2's innovations in MLA and other aspects, and has been regarded as a "mysterious force from the East" by Silicon Valley.

Zhang Lu told Interface News that more than half a year ago, OpenAI and Anthropic employees were talking about the company. At that time, it was mainly because it was very active in the open source ecosystem, and the industry was extraordinarily concerned about the innovation of new models at the architectural level.

Mark, a Chinese employee who works on big models at Meta.(pseudonym)Also said to the interface news reporter, they were concerned about DeepSeek last year, but then only think "this is the first echelon of China's model, is not world-class, than OpenAI and so on is even worse than a cut."

However, recently DeepSeek-V3 and theDeepSeek-R1's emergence as a turning point. In particular, DeepSeek-R1 detonated the user side, being the first open-source large model that replicates the capabilities of the OpenAI o1 inference model and is completely free.

DeepSeek's four layers of "Buff" also make the Chinese engineers in Silicon Valley have to admire: a purely Chinese team with limited arithmetic resources, the effect of beating or equaling the state-of-the-art models in the U.S., and completely open source.

Mark and his colleagues discussed this a lot, with the central point being, "If DeepSeek can train such good models for so some money, what are we spending so much money on?"

Meta employees previously posted on an anonymous U.S. workplace community that the company had set up four specialized research groups to analyze and study the workings of the DeepSeek V3 model. Internally, DeepSeek V3's comprehensive performance has surpassed that of Llama 3, and the company is concerned that the planned next-generation model, Llama 4, is "falling behind".

Zhang Lu deduced that "Llama 4 will definitely reference DeepSeek's current algorithmic model." In fact, the whole industry is now learning to refer to the secret of DeepSeek's success.

According to the DeepSeek development team, the V3 model employs a model compression, multi-head potential attention mechanism(MLA)Hybrid expert modeling(MoE), FP8 mixed precision training and a series of innovative techniques to reduce costs, R1 model and then group relative strategy optimization algorithms(GRPO)This important innovation.

Moore Threads senior director of market ecology Lv Qihang told interface news, DeepSeek's core breakthrough lies in the optimization of algorithms and arithmetic efficiency, although the model project is open source, but the company's unique to these technologies has a fairly high barriers, requiring a large number of teams of engineers to invest in research and development and engineering optimization, it is very difficult for other companies to directly copy the short-term, but the industry is still able to DeepSeek's innovations to Get Inspiration.

Zhang Lu's own most obvious feeling in Silicon Valley is that before OpenAI and Anthropic would not think "they will do better than us" when talking about DeepSeek, but now, "OpenAI, in particular, may see DeepSeek as a competitor." OpenAI CEO Sam Altman even said on a recent podcast that he has plans to have a talk with the DeepSeek team.

Stirring up the arithmetic market

DeepSeek, symbolized by the "whale", is now seen as a "catfish" stirring the global technology market.

Wu Chao, director of the CITIC Securities Research Institute, said to the interface news reporter, DeepSeek's "catfish effect" is the most intuitive embodiment of the "AI wave of cost reduction" - the future of everyone can develop large models at a lower cost. -In the future, everyone will be able to develop large models at a lower cost.

According to DeepSeek's official paper, DeepSeek used 2,048 NVIDIA H800 GPUs to train the V3 model, and the complete training consumed 2,788,000 GPU hours, costing about $5,576,000 based on a rental price of $2 per hour.

As a comparison, Llama-3.1 utilized over 16,000 NVIDIA H100 GPUs for training, costing hundreds of millions of dollars to train, and the training cost of the GPT-4o model is estimated by the industry to be in the neighborhood of $100 million.

Anthropic founder Dario Amodei wrote on his personal podcast site that while claims of "billions of dollars of inputs for $6 million" have been grossly exaggerated, the innovations DeepSeek has made have indeed dramatically reduced costs. What's even more shocking to Silicon Valley is that while the world is seeking to reduce the cost of AI models, "the first one to do so is a Chinese company."

Wall Street is once again panicking about the AI arithmetic bubble. Investors are concerned that once DeepSeek's low-cost model is rolled out in the future, will tech companies need to continue to buy large quantities of NVIDIA's advanced AI chips to support model development?

In the heavy losses suffered by chip stocks in the U.S. stock market on January 27, NVIDIA fell more than 17%, TSMC fell more than 13%, and Asmax fell 5%.

However, NVIDIA in a statement sent to interface news the day after the stock price plunge emphasized that, in the context of mainland China's AI arithmetic resource constraints, DeepSeek mainly in the AI model inference stage using innovative technology, but the inference still requires a large number of NVIDIA GPUs and supporting high-performance network technology. deepSeek's progress not only does not mean that the arithmetic power surplus, "but rather proves that the market needs more AI chips". "Instead, it proves that the market needs more AI chips."

市场近期正流行用一项economics学经典理论“杰文斯悖论”(Jevons Paradox)to explain the change in supply and demand in the arithmetic market. That is, while a technological innovation reduces costs and improves efficiency, resource consumption, instead of decreasing, increases dramatically as a result of cost reductions and wider applications.

In fact, recently Microsoft, Amazon, Meta, Google and other giants have been stating in their respective earnings meetings to significantly increase their capital expenditures in 2025, focusing on investing in data centers and other AI infrastructure projects. The four giants' 2025 capital expenditures will total more than $320 billion, with a total growth rate of about 30%.

In addition, OpenAI, SoftBank, Oracle and other companies have jointly launched a "Stargate" AI infrastructure program.(The Stargate Project)The plan is to invest up to $500 billion in the arithmetic market by 2029, with the first initial investment of $100 billion.

Dennis Laudick, vice president of product management at British GPU company Imagination, told Interface News in an interview that the innovations made by DeepSeek fall into a category of advances that reduce AI power consumption and optimize efficiency. "Any simplification of workloads allows for more full utilization of arithmetic resources, and the demand for arithmetic will continue to outstrip supply in the future."

"Even DeepSeek itself is now lagging frequently due to insufficient servers for the site. In fact, the arithmetic market has been in a high-growth expansion cycle for quite some time." Wuchao, on the other hand, said. So whether it's from the base plate of capital spending by giants, or from new investments in reasoning-end arithmetic by more companies, the arithmetic bubble shows no signs of bursting.

In addition, in the view of industry insiders, DeepSeek's innovation is also a major benefit to domestic arithmetic chips. For example, DeepSeek used FP8 mixed-precision arithmetic training, this mixed low-precision training method can, to a certain extent, make up for the lack of domestic chip hardware performance, providing more software algorithmic innovation space, like Moore Threads has introduced the corresponding technology into the product.

The AI Six Tigers are "rolled".

月之暗面联合创始人张予彤最近更新了一条朋友圈,她发出一张Kimi用户量增长图,在配文中称“很高兴Kimi新版本上线后用户量又创新高”。

What many people don't realize is that the latest version of Dark Side of the Moon's big model, Kimi 1.5, was released on the same day as DeepSeek-R1, but was completely drowned out in the overwhelming DeepSeek wave.

The headquarters of Dark Side of the Moon is less than 1km away from the Rongke Information Center where DeepSeek's Beijing team is located. Last Spring, DeepSeek was still unknown, while Dark Side of the Moon became famous because of Ali's huge investment, and became the head of the "Six Tigers" of large model startups.

Now, one year later, whether it is technological innovation, product awareness, or corporate influence, DeepSeek has undoubtedly soared into the world-class ring. The "Six Tigers" have their own problems.

Li Yang (1938-), PRC politician, prime minister 1987-1998, president of the PRC from 2008(pseudonym)The entire Spring Festival of the Year of the Snake was spent in overtime, and he was engaged in the optimization of AI infra in one of the "six tigers". Although in his opinion, the MFU of DeepSeek-V3(Model arithmetic utilization)It makes sense under its modeling structure, but pressure is still placed on the work of its team as an extremely important part of cost control.

Not only Li Yang, there are a number of "six tigers" employees told the interface news, they only in the New Year's Eve and the first day of the New Year vacation, the rest of the time as usual pounced on the work. According to the interface news reporter understands, one of the company's algorithm team almost the whole vacation to accelerate their own reasoning model parameter scale up.

A technical person in charge of the "six tigers" said that the work atmosphere has become tense because of the public's high expectations of AI. The company will maintain the original iterative rhythm, "but the priority of some projects may be adjusted".

A big model field investors said to the interface news reporter, because of DeepSeek's fire, "six little tigers" next financing and valuation will be fully affected. According to his understanding, a state-owned investment in a large modeling company is now being questioned internally why it chose to invest in that company in the first place. "This is actually a kind of accountability."

Right now, the "six tigers" out of their core differentiation route sense of urgency, more than at any time in the past is more prominent: if you do not continue to burn money to innovate, their pre-training models and inference models are very difficult to beat DeepSeek; but if you do not do the pre-training, it is difficult to support the valuation of the next financing.

And on the first road is even more difficult, DeepSeek is also very likely to impact the closed source ecology. Zhu Xiaohu bluntly said that in China, only the big Internet companies have the sense to continue to roll closed-source models.

In fact, even the closed-source ecological fortress of the Internet big manufacturers has also appeared cracks. Baidu, which has been regarded as the industry's most determined to take the closed-source route, also formally announced on February 14 that it would launch the Wenxin Big Model 4.5 series in the next few months and open source it for the first time from June 30th.

Last year in the industry debate model open source and closed source mode of the most heated debate, Baidu founder Robin Li has on several occasions publicly supported the closed source model, bluntly stated that "open source model is an IQ tax". Before and after less than a year, regardless of Baidu's current choice out of active or passive, such a 180-degree turn has made the industry had to re-examine the future of open source and closed source model.

"The big Chinese closed-source model has almost become a dead end." So said the aforementioned investor.

get a share of the action

A month before DeepSeek burst into flames this spring, Liang Wenfeng had asked Yuan Jinhui whether he wanted to deploy DeepSeek-V3 models on their platform. Yuan Jinhui founded Silicon Flow, which focuses on big model cloud service platforms to provide infrastructure for generative AI.

Liang Wenfeng suggested at the time that it would be best to prepare 80 H100 servers if they were to be deployed. Yuan Jinhui then did some calculations, and found that it should cost five to six million dollars a month, so he did not bet. Turning DeepSeek fire all over the world, he suddenly felt "poor decision-making, want to cry".

While watching DeepSeek continue to create miracles, Yuan Jinhui was anxious but suffered from a lack of resources. A colleague had an idea, "Let's use a domestic card!" This idea was also strongly supported by Huawei's Rising Cloud team, which led to the opening story.

Since then, Yuan Jinhui has become an active "customer service number" on social media platforms, constantly providing feedback on the shortcomings of the product experience, as well as areas to be strengthened and improved. For seizing the opportunity to get involved in the DeepSeek traffic vortex, Yuan Jinhui said nothing more than "the speed of dissemination is too fast, there are too many users, the demand is too great".

It has become an industry consensus, especially among cloud vendors, to "participate" in DeepSeek's frenzy. When DeepSeek's "server is busy, please try again later" appeared frequently due to too many visits, the traffic with no place to go turned to DeepSeek's cloud platform.

This wave of foreign companies, on the contrary, moved more quickly. in the last two days of January, Microsoft Azure and Amazon AWS announced that they were on-line DeepSeek-R1, and Google Cloud immediately released a deployment guide for R1. in the first week of February, in addition to Silicon Mobility and Huawei Rising Cloud, vendors such as Tencent Cloud, Aliyun, Baidu Intelligent Cloud, Volcano Engine, and Beijing Dongyun, all announced that they were accessing the R1 reasoning model, partially V3 were introduced at the same time.

Not only that, the three major telecom carriers, China Mobile, China Telecom, and China Unicom, which previously did not have a prominent voice in the AI field, have gone live with DeepSeek models in their cloud services.

Among the chip vendors, NVIDIA NVIDIA NIM introduced R1, while AMD integrated V3 into Instinct MI300X GPUs and optimized it for AI inference. Domestic chip vendors Moore Threads, Mu Xi, Wall Ren, and Tianji Zhixin collectively announced the deployment of adapted DeepSeek models.

Even other big model peers are starting to access DeepSeek. kunlunwanwei is the first group to act. This company not only has its own Tiangong big model, but also built the inference model. But Kunlun Wanwei still in its "Tiangong AI" online "DeepSeekR1 + network search" function.

DeepSeek风暴:低成本高性能引领AI产业链变革

Fang Han, CEO of Kunlun Wanwei Photo Credit: Interface News

Kunlun Wanwei CEO Fang Han has an open mind, telling Interface News that the company is not adjusting its strategy in response to the impact, but is convinced that it is more conducive to user experience. He observed that after the addition of R1, the average length of time that users use Tiangong AI search has indeed become significantly longer.

Also surprisingly, Tencent Yuanbao also accessed DeepSeek R1 in its own hybrid grand model. this is also the first major Internet company to access R1 in its C-suite AI assistant.

The east wind of the AI industry chain has arrived

After DeepSeek's sudden fire around the world, more voices have emerged in US politics calling for stronger chip controls on China. But others have reflected that perhaps it was high-end chip controls that pushed DeepSeek to make these innovations in algorithmic architecture and engineering.

According to Fang Han, arithmetic limitations can only create certain barriers in the short term, and in the long term, the desire of Chinese researchers to optimize the efficiency of their hardware will only be stronger when China's arithmetic is limited. "In case Scaling Law really ends up relying on algorithms rather than arithmetic power, the American vision will be defeated." Fang Han said.

Inspired by DeepSeek, Fang Han has already considered applying GPRO and other algorithms in Kunlun Wanwei's subsequent model training. He even considers this a "T0-level" innovation, "How to generalize it to other pendant fields besides math and programming, as well as other modalities in the future, I think this is very valuable."

In addition to companies like Kunlun Wanwei, more downstream application companies will be riding on DeepSeek's coattails.

Taking education companies as an example, recently a number of educational institutions such as Good Future, Netease Yudao, Cloud Learning, Zhonggong Education, Ape Tutoring, Reading Lang, etc. have intensively accessed DeepSeek, covering all kinds of business from online education, vocational training, personalized learning to corporate staff training. Some industry insiders even called it "the real Normandy moment of education AI".

Tian Mi, CTO of Good Future, told Interface News that the team has been tracking DeepSeek's progress since the earliest V1 version and applying it to their own business. "Judging from the results, they're doing great."

Domestic cell phone manufacturers are also busy joining in, Huawei, glory, OPPO, vivo, Meizu, Red Magic, Nubia and others have announced access to DeepSeek-R1. the AI head of one manufacturer told interface news that DeepSeek's biggest impact on cell phone manufacturers is open source, as well as finally being able to lay out high-cost inference models on consumer products like cell phones, and that in the future, they'll consider Distilling small end-side models.

The securities industry has also been swept by DeepSeek. Up to now, nearly 20 securities firms, including CICC Wealth, Guotai Junan, Huafu Securities, Guangfa Securities, IFC Securities, and Industrial Securities, have announced the completion of the localized deployment of DeepSeek models. The application scenarios mainly focus on intelligent investment research, customer service, investment consulting, IT operation and maintenance, risk control, marketing and other areas.

For example, the relevant person in charge of IFC Securities said that its DeepSeek-based "deep thinking" industry chain intelligent mining system has been put into application.

DeepSeek also brings more opportunities for application entrepreneurship. Although DeepSeek now makes the best open-source model in China and the world, the application direction is very complicated, DeepSeek can't meet all the long-tail needs.

As Liang Wenfeng said in a previous interview about his expectations for the end game of big models: in the future, there will be companies that specialize in providing basic models and services, forming a professional division of labor system with a long industrial chain. More companies will build on these foundations to provide solutions to the diverse needs of society.

In the past, the investors who chased to invest in the basic big models have begun to focus more on the "AI application".

Chen Yu, a partner at Yunqi Capital, judged that a large number of ISV companies may emerge in 2025 to utilize the open source big model to serve the market at a low cost, which will breed a large number of independent small factories behind the opportunity.

The potential for application scenarios in which full automation rather than co-pilot mode is more prominent after the model's capabilities are improved. "Just like co-pilot, when the technology is not good enough, L2 can fulfill the demand, but when the technology is good enough, people still want L4 more."

Recently, the market has been rumored that DeepSeek is financing at a valuation of $8 billion. Chen Yu believes that at the moment, VC to go after DeepSeek does not make any sense, now it is more important to turn their attention to other changes in the AI industry chain opportunities, such as embodied intelligence, edge computing, intelligent hardware, and many opportunities for independent applications.

"You can't cast the big model itself, you can still lay out its perimeter ahead of time." He said, "It's a lot more blossoming here."

This article is from WeChat:Polyhedron InterfaceX, by Wu Yangyu and Li Biao, edited by Liu Fangyuan

© Copyright notes

Related posts

No comments

none
No comments...