SuperCLUE

10mos agorelease 1,276 0 0

A comprehensive evaluation tool for Chinese big models, which truly reflects the general ability of big models through a multi-dimensional and multi-perspective evaluation system, and helps technical progress and industrialization development.

Location:
China
Language:
zh
Collection time:
2024-06-30
SuperCLUESuperCLUE

The SuperCLUE evaluation system is a system that focuses on ChineseLarge Model ReviewThe open-source tool aims to truly reflect the generalized capabilities of large models through a multi-dimensional and multi-perspective evaluation system.

Sponsoring organization and background

The SuperCLUE evaluation system was jointly constructed by Tsinghua University, Noodle Intelligence, Zhihu and other organizations in the OpenBMB open source community. Its predecessor can be traced back to the third-party Chinese Language Understanding Evaluation benchmark CLUE (The Chinese Language Understanding Evaluation), which has been committed to providing scientific, objective, and neutral language model evaluation since its inception in 2019.

Review Features

  1. Multi-dimensional Comprehensive Assessment: The SuperCLUE assessment system provides comprehensive assessment through multiple dimensions, including basic competency, professional competency, and Chinese characterization competency, etc. The basic competency covers 10 competencies such as semantic comprehension, conversation, and logical reasoning. Basic competence covers 10 competencies such as semantic comprehension, conversation, logical reasoning, etc. Professional competence includes secondary school, university and professional exams, covering more than 50 competencies from mathematics, physics, geography to social sciences, etc. Chinese specific competence is for tasks with Chinese characteristics, such as Chinese idioms, poems and so on.
  2. Automated Assessment Technology: As a completely independent third-party assessment organization, SuperCLUE adopts automated assessment technology to effectively eliminate uncertainties caused by human factors and ensure the provision of unbiased and objective assessment results.
  3. Open Subjective Question Evaluation: In order to ensure consistency with the real user experience, SuperCLUE incorporates open subjective question evaluation, through a multi-dimensional, multi-perspective, multi-level evaluation system and the form of dialog, to realistically simulate the application of large model scenarios, and to truly and effectively examine the model generation capability.
  4. Multi-Round Dialogue Scenario Evaluation: SuperCLUE builds multi-round dialogue scenarios to examine the application effect of the big model in real multi-round dialogue scenarios at a deeper level, and evaluates the big model's context, memory, and dialogue ability in all aspects.

Evaluation data sets and tasks

SuperCLUE's evaluation dataset includes 2,194 questions covering the ten basic tasks of computation, logical reasoning, code, tool use, knowledge encyclopedia, language comprehension, long text, role play, generation and creation, and security. For example, in the April 2024 evaluation, Yun Zhisheng Shanhai Big Model achieved an excellent total score of 69.51, ranking among the Top 10 big models in China. in terms of long text capability, which has industrial landing significance, Shanhai Big Model achieved an excellent score of 68.2, ranking the fourth big model in the world and the third big model in China.

Impact and significance

The SuperCLUE evaluation system provides important guidance for the technical progress and application of large models. By comparing the performance of different models in the SuperCLUE evaluation system, researchers and developers can have a clearer understanding of the strengths and weaknesses of the models, and then optimize and improve them in a targeted manner. Meanwhile, the SuperCLUE evaluation system also provides an important reference for the landing of big models in the industry, which helps to promote the practical application and industrialization of big model technology.

In conclusion, the SuperCLUE evaluation system is a comprehensive, objective and fair evaluation tool for Chinese big models, which provides strong support for the research and application of big model technology.

data statistics

Related Navigation

No comments

none
No comments...