LMArena's latest list: DeepSeek-R1 web programming power catches up with Claude Opus 4

Newsflash14hrs agoupdate AiFun
13 0

in the field of open source modeling.DeepSeek Another surprise.

上个月 28 号,DeepSeek 来了波小更新,其 R1 inference model升级到了最新版本(0528),并公开了模型及权重。

This time, R1-0528 further improves benchmarking performance, enhances front-end functionality, reduces illusions, and supports JSON output and function calls.

LMArena最新榜单:DeepSeek-R1网页编程能力赶超了Claude Opus 4

Today, LMArena, the industry-renowned, but also recently embroiled in controversy (having been noted for favoring OpenAI, Google, and Meta's big models), a public benchmarking platform for big models, released its latest performance rankings, which DeepSeek-R1(0528) was particularly notable.

LMArena最新榜单:DeepSeek-R1网页编程能力赶超了Claude Opus 4

Among them, DeepSeek-R1 (0528) ranked 6th overall and 1st among open models in the text benchmark test (Text).

LMArena最新榜单:DeepSeek-R1网页编程能力赶超了Claude Opus 4

Specific to the following segments:

  • Ranked #4 in the Hard Prompt test
  • Ranked #2 in Coding Tests
  • Ranked #5 in Math Tests
  • Ranked No. 6 on the Creative Writing test
  • Ranked #9 in the Instruction Fellowing test
  • Ranked #8 in the Longer Query test
  • Ranked #7 in the Multi-Turn test
LMArena最新榜单:DeepSeek-R1网页编程能力赶超了Claude Opus 4

In addition, on the WebDev Arena platform, DeepSeek-R1 (0528) tied for first place with closed-source large models such as Gemini-2.5-Pro-Preview-06-05 and Claude Opus 4 (20250514), and surpassed Claude Opus 4 in terms of score.

LMArena最新榜单:DeepSeek-R1网页编程能力赶超了Claude Opus 4

WebDev Arena is a real-time AI programming competition platform developed by the LMArena team that pits various large language models against each other in a web development challenge that measures human preference for the models' ability to build beautiful and powerful web applications.

DeepSeek-R1 (0528) showed great performance that provoked more people to use it.

LMArena最新榜单:DeepSeek-R1网页编程能力赶超了Claude Opus 4

It was also stated that given that Claude has long been the benchmark in AI programming, the fact that DeepSeek-R1 (0528) now matches Claude Opus in performance is a milestone moment and a pivotal moment for open source AI.

DeepSeek-R1 (0528) provides leading performance under the fully open MIT protocol and rivals the best closed-source models. While this breakthrough is most evident in Web development, its impact may extend to the broader field of programming.

However, raw performance does not define real-world performance. While DeepSeek-R1 (0528) may be comparable to Claude in terms of technical capabilities, whether it can provide a user experience comparable to Claude's in day-to-day workflows is something that needs to be verified in more real-world situations.

LMArena最新榜单:DeepSeek-R1网页编程能力赶超了Claude Opus 4

 

(Text: Heart of the Machine)

© Copyright notes

Related posts

No comments

none
No comments...