An article that takes you through the ins and outs of the DeepSeek blowout event

trade2mos agoupdate AiFun
17,194 0

During the Spring Festival, the most hot topic is theDeepSeekDespite the fact that it's all over the internet, there are still a lot of people who don't know what DeepSeek is, why it's so hot, and what's so great about it. So we've compiled a list of eight basic questions about DeepSeek, in the hope that it will give those who need it some reference to review the whole story.

一文带你了解DeepSeek爆火事件的来龙去脉

I . What is DeepSeek?

DeepSeek is a technology company that focuses on implementing general artificial intelligence (represented by large models). It was founded in July 2023 by quantitative capital management giant "Phantom Quantitative".

DeepSeek also refers to the ChatGPT-like intelligent assistant developed by DeepSeek Inc. Currently, this intelligent assistant is available on both web and mobile and with its amazing speed and strength, it has been launched in theA global "earthquake" in the tech world.This app is known as the "Light of Domestic AI". This app, which is known as the "Light of Domestic AI", has not only topped the free list of App Store in the U.S. region, but also occupied the top of the free list of App Store in China, demonstrating its strong market appeal.

In addition, outsiders have referred to the company's development of a series of large model products generically as "DeepSeek."

II . What big models have been released by DeepSeek?

DeepSeek has released 13 large models, all of which are open source. Developers around the world can use DeepSeek's technology to develop their own models, applications, and products.

The basic information of each model is shown in the table below:

一文带你了解DeepSeek爆火事件的来龙去脉

The models that have recently attracted a lot of attention around the world are mainly the self-developed generalized large model DeepSeek-V3, the inference model DeepSeek-R1 .

DeepSeek-V3 is a generalized model that can be tried with V3 for common everyday problems.

DeepSeek-R1 It is a reasoning model that is good at dealing with complex, multi-step thinking problems, and is suitable for doing deep research, solving code problems, and mathematical problems.

一文带你了解DeepSeek爆火事件的来龙去脉

(*DeepSeek's official GitHub page: https://github.com/deepseek-ai)

III . How can regular users use DeepSeek and where can they call the API?

DeepSeek is now available as an official app, both on the web and on mobile.

Regular users can use DeepSeek's products by signing up, and both the Web and the APP are currently free.

The Web side is accessed directly by visiting the URL (https://chat.deepseek.com/) dialog. In the lower left position of the dialog, you can choose whether to turn on the " DeepSeek " mode or not. If checked, the DeepSeek-R1 model will be used; if unchecked, DeepSeek-V3 will be used by default.

一文带你了解DeepSeek爆火事件的来龙去脉

App Search for "DeepSeek" in the app store, but be careful to choose the official version.

一文带你了解DeepSeek爆火事件的来龙去脉

On the APP side, users can choose to use both the networking and reasoning functions.

一文带你了解DeepSeek爆火事件的来龙去脉

Recently, however, DeepSeek has been striking from time to time due to cyber-attacks (or other factors), requiring constant retries to get the model to return results.

Developers can also call DeepSeek's APIs through a number of channels.

DeepSeek Developer Platform:
Accessing the DeepSeek Consolehttps://platform.deepseek.com/If you want to get the key, register, log in and purchase the key.

NVIDIA NIM Microservices:
https://build.nvidia.com/deepseek-ai/deepseek-r1To register an account with DeepSeek-R1, you need to use your email address.

Microsoft Azure:
https://ai.azure.com, Microsoft Azure can deploy DeepSeek-R1 to create a chatbot through a chat playground.

Amazon AWS:
https://aws.amazon.com/cn/blogs/aws/deepseek-r1-models-now-available-on-awsDeepSeek-R1 is now available in the Amazon Bedrock Marketplace and Amazon SageMaker JumpStart, as well as in the Amazon Bedrock Custom Model Import and Amazon EC2 instances to use the DeepSeek-R1- Distill model.

Silicon-based mobility SiliconCloud :
https://siliconflow.cn/zh-cn/models, went live with DeepSeek-V3 and DeepSeek-R1 based on Huawei Cloud's Rise Cloud service, and developers can directly call SiliconCloud APIs at the same price as DeepSeek's official promo period price.

In addition, Cerebras, Groq can also call the API of DeepSeek-R1.

IV . What can DeepSeek do?

After DeepSeek became a big hit, various uses were developed by various people:

The first is a high emotional intelligence accompaniment:

一文带你了解DeepSeek爆火事件的来龙去脉

(* Source: Internet)

There are a very large number of netizens who use it as a fortune teller and count the Ziwei star.

一文带你了解DeepSeek爆火事件的来龙去脉

Others use DeepSeek as a financial advisor. It will directly help you comprehensive assessment, high-risk, high-return rate program (DeepSeek does not constitute any investment advice).

一文带你了解DeepSeek爆火事件的来龙去脉

There are also users who make comprehensive use of DeepSeek's document summarization, text generation and assisted code writing features to generate all kinds of social copy and cards.

一文带你了解DeepSeek爆火事件的来龙去脉

(* From the case of Wojciech Loves AI)

Some netizens even used DeepSeek to write up PS retouching scripts to realize one-click retouching.

一文带你了解DeepSeek爆火事件的来龙去脉
一文带你了解DeepSeek爆火事件的来龙去脉

(* ps script written by DeepSeek)

一文带你了解DeepSeek爆火事件的来龙去脉

(* after running the script)

V . Why are the V3 and R1 models getting so much attention?

These two models, have the following features:

1. Excellent performance

The performance of these two models is close to and in some scenarios even surpasses the best products of OpenAI, the "recognized" global benchmark company (DeepSeek-V3 against GPT-4o and DeepSeek-R1 against o1).

2. Combined applications

After the release of both models, DeepSeek's Web/APP will be launched so that more people can experience the effect of the models.

3. Low training costs and cost-effective products

V3 uses only 2,048 H800 GPUs and takes 3.7 days to train, defying conventional wisdom in terms of the number of GPUs used and the length of training.

Both R1 and V3 are available for free on the DeepSeek website; API pricing has the R1 input portion at 1.821 TP4T for o1 and the output portion at 3.651 TP4T for o1, and the V3 input portion at 1.121 TP4T for GPT-4o and the output portion at 2.81 TP4T for GPT-4o.

4. Technological innovations

The training model of DeepSeek-R1 disrupts the conventional knowledge. deepSeek-R1 is the first model that validates significant inference enhancement and emergence through RL (Reinforcement Learning) alone without SFT (Supervised Fine-tuning). This training approach dramatically reduces the cost of data labeling, simplifies the training process, and reduces the overall training cost.

5. Open source

There are no other open source models that benchmark GPT-4o and o1 in terms of performance, and none of the core models that are the mainstay of OpenAI are open source, so users have to call them through an app or API.

一文带你了解DeepSeek爆火事件的来龙去脉

(*Official metrics put the V3's overall performance close to GPT-4o, with scores even exceeding 4o on many specific review datasets.)

一文带你了解DeepSeek爆火事件的来龙去脉

(*DeepSeek-R1 is benchmarked against OpenAI's o1. According to official reviews, R1's performance is indeed close to that of o1, and slightly higher on some review sets.)

一文带你了解DeepSeek爆火事件的来龙去脉

(*Price comparison of API calls for DeepSeek-R1 vs. OpenAI o1)

VI . Why does DeepSeek make Silicon Valley so " scared"?

1. Chinese AI companies are making real innovations, and U.S. tech majors are worried about losing their lead.

Prior to this, model-level technological innovation, while not uncommon, has been a rhythm of U.S. modeling vendors taking the lead and other vendors following suit to validate. This time DeepSeek is ahead of the curve.

First, DeepSeek is innovative in both model training and architecture.

Prior to DeepSeek-R1, the more common training route for models was SFT combined with RL (Supervised Fine-Tuning combined with Reinforcement Learning), and in this release, DeepSeek demonstrates for the first time in experiments that RL alone can lead to capacity improvements.

Meanwhile, a key architectural innovation of the V3 model is the Multi-Head Latent Attention mechanism, which can significantly reduce the cost and improve the efficiency of the reasoning phase.

These are things that American AI companies are not doing.

AI development has long relied on the accumulation of computational power in what can be described as a race between hyperscalers.

Against US competitors, DeepSeek's innovations have achieved an order of magnitude reduction in training costs and price of use, eroding a significant market-leading advantage for US companies.

2. Open source: an ecology that, if it catches fire, will capture the market for U.S. companies

DeepSeek's R1 not only publicized the training process through a technical report, but also open-sourced the weights of the model.

DeepSeek's inference models have high performance and low price, enabling developers to use them in a growing number of scenarios.

Recently, Microsoft, NVIDIA, and AWS have all accessed DeepSeek-R1.

3. Big model-related U.S. tech stocks have been hit hard, with the first signs of a "threat" emerging.

NVIDIA's stock plummet seems to hint at a real threat from DeepSeek.

This is because DeepSeek's route shows, to some extent, that it is possible to train big high-performance models without the most powerful arithmetic, and DeepSeek's route of open-sourcing high-performance models may cause more companies to give up on training the models, hitting demand for NVIDIA's core arithmetic products (GPUs) and affecting the stock price.

Moreover, the market is concerned that DeepSeek's success will impact the market prospects of US-focused tech companies such as OpenAI, especially in the direction of closed-source modeling.

一文带你了解DeepSeek爆火事件的来龙去脉

(*Performance of DeepSeek's successive releases against industry-leading models)

Seven . Where else will DeepSeek iterate in the future?

This is partly speculation about DeepSeek's future moves.

Based on current results, outsiders believe that future innovations will still be centered around theCost, performanceThese two core elements.

Multimodal capability complement.In the early hours of New Year's Eve, DeepSeek's newly released DeepSeek-Janus-Pro model is a multimodal model with both visual understanding and visual generation capabilities.

However, the Janus family of models are all small parameter count models, and how to train a multimodal model with a large parameter count through Janus' innovative modeling framework may be one of the future focuses.

DeepSeek is finally launching an APP product for C-users in January 2025, and may in the futureExplore / Collaborate for more applications.

The impact of DeepSeek's explosion on the AI landscape in China and the U.S.?

1. Domestic AI companies face further restrictions.

The results of DeepSeek's low-cost training may cause the U.S. to further shrink the number of chip models available for export. In the future, there will be fewer and fewer GPU models available to domestic modelers, with older generations.

Some countries and regions have asked DeepSeek to stop the service due to privacy, data compliance and other queries.

X (Twitter), some AI science bloggers have shifted from mindless posts bragging about DeepSeek to posts that teach users " how to locally deploy a DeepSeek R1 to protect their data."

2. Competition in the global AI ecosystem could be reshaped.

The fact that DeepSeek is recognized by the market shows, to a certain extent, that algorithmic efficiency and cost-effectiveness will become the core elements in future competition.

DeepSeek is pushing the AI industry from an " arithmetic arms race " to a " algorithmic efficiency war ", and AI technology is further universalized.

Companies that used to be "math-focused" are going to have to revisit their strategies.

3. Silicon Valley giants are desperate to get back ahead.

A sense of urgency to innovate technologically and regain the lead grips America's tech giants.

Allegedly, Google, Apple, Meta and other companies have begun in-depth study of DeepSeek. each earnings call, DeepSeek is also a question that can not be bypassed.

It is imperative for Silicon Valley houses to launch the next generation of leading models as soon as possible.

(Note: This article is reprinted from Tencent.com, the original "eight questions to take you zero basis to understand DeepSeek", slightly changed)

© Copyright notes

Related posts

No comments

none
No comments...