Wenxin 5.0 official version released, dominated the LMArena “strongest liberal arts students” in the end strong?

Recently, at the Wenxin Moment conference, theWenxin Big Model 5.0 Official VersionOn the line.

It is claimed that the number of model participants amounted to2.4 trillionIt adopts native full-modal unified modeling technology with full-modal comprehension and generation capability, and supports input and output of various information such as text, image, audio and video.

existMore than 40 itemsIn a comprehensive review of authoritative benchmarks, the language and multimodal comprehension capabilities of Wencent 5.0 official version are firmly in theInternational first tier.. Audio and visual generation capabilities are comparable to vertical domain specialization models, with an overall inworld leader.

Currently, individual users can experience it on the Wenxin APP and Wenxin Yiyin official website, and enterprises and developers can experience it through theBaiduThe Thousand Sails platform makes the call.

We first experienced Wenshin 5.0, and the evaluation results proved that the model is not only able to deal with tasks such as complex emotions, strings, and picture metaphors in different cultural contexts, outputting more contextualized and scenario-based replies, but also generating creative and logical writing content through excellent planning, reflection, and logical reasoning abilities. It can be said that the big modeling world“The Strongest Liberal Arts Student”.

Although there has been a preview version of the pad, Wenxin 5.0 official version of the online or let a person bright. Domestic multimodal model, really has entered the “native full modal” time.

I. Steady in the first echelon of the world, Wenxin 5.0 opens the way to native full modality

Wu Tian, Vice President of Baidu Group and Deputy Director of the National Engineering Research Center for Deep Learning Technology and Applications, introduced that, unlike most of the industry's multimodal solutions that use “late fusion”, the technical route of Wencent 5.0 usesA unified autoregressive architecturecarry outNative full modal modelingThe model is designed to train text, image, video, audio and other multi-source data jointly in the same model framework, so that the multi-modal features can be fully fused and co-optimized under a unified architecture to achieve theNative Omnimodal Unified Understanding and Generation.

▲Wu Tian, Vice President of Baidu Group and Deputy Director of National Engineering Research Center for Deep Learning Technology and Applications

Wenxin 5.0 has overcome the difficulty of unified modeling of multimodal comprehension and generation, finely modeled multimodal semantic features, realized mutual enhancement of comprehension and generation, and comprehensively enhanced the ability of full-modal comprehension and generation.

Wenxin 5.0 usesStructure of Hybrid Expert Models for Very Large ScaleRelying onFlying Paddle Deep Learning FrameworkPerform ultra-large scale MoE model training with total parameter size over2.4 trillionThis reaches the industry's highest number of models with publicly available parameters. With ultra-sparse activation parameters, the activation parameters are more thanBelow 3%, which reduces computational and inference costs while maintaining the model's robustness.

Meanwhile, synthesizing long-range task trajectory data based on large-scale tool environments and employing end-to-end multi-round reinforcement learning training based on thought chain and action chain significantly improves the model's intelligences and tool invocation capabilities.

No matter from the technical architecture route or from the big model infrastructure, Wenxin 5.0 is almost the top match of domestic big model, which has made it realize one time and again on LMArena, the international authoritative big model arena.

In the past three months, the Wenxin 5.0 series model has been on the LMArena list five times, and in the text (Text Arena) and visual understanding (VisionArena) list, it has been ranked the first in China many times, and it is the only Chinese large model that has entered the first echelon of the world.

Second, can write science fiction, can analyze dating resume, Wenxin 5.0 into the “strongest liberal arts students”

When Wenshin 5.0 Preview was released, some users called it “the strongest liberal arts student”. Today, we test Wenshin 5.0 Preview to see if this title is true.

For the first question, let's see how well Wenshin 5.0's knowledge base and literary skills stack up:

Let's start by having Wenxin 5.0 write a sequel to Liu Cixin's short science fiction novel Wandering Earth, which is required to follow the writing style of the original text as well as the story's setting and characterization, to test its knowledge base.

In about 3 minutes, the model completed a super-short sequel to Wandering Earth, titled Wandering Earth: The Silent Era.

The whole article seems to be “read” the original novel, the original text of the “Great Rebellion” and “solar helium flash” and other elements, but also from the “I” first point of view to tell the story of the human race is running out of resources to adopt Plan B - the fire plan. It also tells the story from "my" first point of view, when resources are running out and the human race is adopting Plan B, the Spark Program. Overall, the text reads smoothly, with Liu Cixin's matter-of-fact style and smooth plotting.

In addition to novels, how is the readership of Wenxin 5.0? We throw it this question: the same as the palace fighting drama "Legend of Zhen Huan" "Yi Yi" "Yan Xi Raiders" on the network is often compared, if Zhen Huan, Yi Yi and Wei Luo are in the same deep palace, who can laugh in the end?

Wenxin 5.0 firstly chose an era in which all three characters coexisted, analyzed the character traits and experiences of each of the three, and decided the final winner as “Wei Luo” in a two-by-two duel method. The analyzing process is well organized, and it seems to be a senior fan of the drama.

Next, let's test Bunshin 5.0's emotional intelligence:

First, we uploaded a screenshot of a Little Red Book post asking for help on how to reply to a girlfriend who always says “you don't love me anymore”.

It can be seen from the thinking process of Wenxin 5.0 that it judged the behavioral motives of the girlfriend in the question, and also considered the psychology of the boys, and first put itself in the shoes of the user to comfort the user, and then proceeded to give the solution, which would reduce the sense of preaching.

From the content of the answer, Wenxin 5.0 gives four sets of methodology, each set is really feasible, but also clear the girlfriend always say “you do not love me” the strings are actually “think of you”. It is in the specific wording a little bit “oil”, do not go cute wind straight boyfriends try not to copy.

Then, we uploaded a short video titled “Can I Marry This Boy?” to see if Wenxin 5.0 could see something wrong in the resume of his date.

The original video is 1 minute long, and the video host speaks in a mixture of English and Chinese and speaks so fast that I had a hard time following it without reading the subtitles. However, Wenshin 5.0 was able to understand and analyze the content of the video within a minute, and picked out the unreasonable and hidden things in the resume of the date, and the wording was also merciless.

III. Native Omnimodality: Pointing to the Future of Multimodal Large Models

How is such a stunning experiential effect achieved? To answer this question one has to start with the category of multimodal macromodels.

The current multimodal macromodels on the market are mainly classified asSpliced and nativeTwo types. Among them, the splicing type is the mainstream form of the industry, using modular architecture, through the independent training of each modal model and then splicing to achieve fusion, although it has a certain degree of flexibility, but there is an obvious problem of information loss.

First proposed at GPT-4o“Native multimodality.”The release of Gemini 3 has brought “native multimodal” into the spotlight. Baidu, on the other hand, went a step further by proposing the“Native Omnimodal” Architecture.

The native omnimodal architecture, on the other hand, starts from the underlying logic at the beginning of training, and takes theText, images, audio, video, etc.Deep fusion of multimodal data to buildunified semantic spaceIn this way, we can achieve a more efficientcross-modal understanding.

At the same time, the “native omnimodal” approach effectively avoids the need forcataclysmic forgetting, allowing for smoother fusion of modal data at the base level and a dramatic increase in the ability to generalize across modal tasks.

There is a view in the industry that this is essentially a dispute over technology routes:"Native Architecture" is rewriting the game for big model vendors.If domestic manufacturers fail to break through the native architecture in 2025-2026, they may be reduced to function followers in the future AI competition. Obviously, Baidu Wenshin 5.0 has led Baidu to take the lead in this track to break through and build a certain technical moat.

How can we arrive at AGI? A growing number of industry experts believe that AI's ability to truly perceive the world, interact with the physical world, and learn from the physical world is AGI.

Letting AI learn in the physical world means that AI should perceive the world in multimodal data such as language, image, video, audio, etc. like a human being, cross-referencing the multimodal data with each other, and then forming a perception of the world.

Look at it this way.Native all-modal architecture, perhaps, will be the foundation and cornerstone of AGI.

Conclusion: Domestic large models enter the “native full-modal” time

Wenxin 5.0 performs stably in various kinds of tasks, such as knowledge quiz, complex scene understanding, creative writing, etc. Its ability to follow instructions, understand the context, and think in multiple rounds has become more mature, and it has demonstrated its “spirituality” and practical value beyond that of a tool.

At present, Google has clearly taken “native multimodal” as the core direction. Wenxin 5.0 is based on this perfect “native full-modal”, which means that there is a benchmark product with large-scale parameter and practical application ability in this technology path in China.

Domestic big models, into the “native full modal” time.

Source: Smart Stuff

artifact # Baidu

The copyright of the article belongs to the author, please do not reprint without permission.

New MoE architecture! Ali open source Qwen3-Next, training costs straight down 90%!

artifact # Qwen3

阿里开源首个图像生成基础模型Qwen-Image，支持中文高保真输出，登顶全球开源榜首

Ali open source the first image generation base model Qwen-Image, support for Chinese high-fidelity output, topped the global open source list

artifact # Qwen-Image

Google Gemini 3 release that tops the list: crushing the competition was praised by Musk, training relies on TPU to show strength

artifact # Google

Midjourney Releases First Video Generation Model V1: Supports Up to 21 Seconds, $10 Per Month

artifact # Midjourney

No comments

No comments...

Wenxin 5.0 official version released, dominated the LMArena “strongest liberal arts students” in the end strong?

I. Steady in the first echelon of the world, Wenxin 5.0 opens the way to native full modality

Second, can write science fiction, can analyze dating resume, Wenxin 5.0 into the “strongest liberal arts students”

III. Native Omnimodality: Pointing to the Future of Multimodal Large Models

Conclusion: Domestic large models enter the “native full-modal” time

Apple Opens Source for New Model SHARP! Turn Photos into 3D Worlds in Seconds

Ali Releases Qwen3-Max-Thinking Reasoning Model with International Leadership in Several Performance Areas

Related posts

New MoE architecture! Ali open source Qwen3-Next, training costs straight down 90%!

Ali open source the first image generation base model Qwen-Image, support for Chinese high-fidelity output, topped the global open source list

Google Gemini 3 release that tops the list: crushing the competition was praised by Musk, training relies on TPU to show strength

Midjourney Releases First Video Generation Model V1: Supports Up to 21 Seconds, $10 Per Month

No comments

Popular Articles

Popular Sites