Large model evaluation

Total 7 articles 网址

Hot Products Domestic Selection Overseas Selection Category Recommendation Industrial Integration Courses of Study Open Source Project Large Model Large model evaluation AI Company Selection Latest Collections

Sorting

release update Views Like

SuperCLUE

A comprehensive evaluation tool for Chinese big models, which truly reflects the general ability of big models through a multi-dimensional and multi-perspective evaluation system, and helps technical progress and industrialization development.

07,9910

Large model evaluation # Large Model Review

OpenCompass

An open-source big model capability assessment system designed to comprehensively and quantitatively assess the capabilities of big models in knowledge, language, understanding, reasoning, etc., and to drive iterative optimization of the models.

07,3210

Large model evaluation # Large Model Review

HELM

Initiated by Stanford University, it aims to comprehensively assess the capabilities of big language models through multiple dimensions and scenarios in order to drive technological advancement and model optimization of the evaluation benchmark.

06,8560

Large model evaluation # Large Model Review

MMBench

A multimodal benchmarking framework designed to comprehensively assess and understand the performance of multimodal models in different scenarios, providing robust and reliable evaluation results through a well-designed evaluation process and labeled datasets.

07,1480

Large model evaluation # Multimodal Evaluation # Test Framework

AGI-Eval Review Community

It is a comprehensive assessment platform focusing on evaluating the general ability of large models in human cognition and problem solving tasks, which is jointly created by well-known universities and organizations, providing diversified assessment methods and authoritative rankings to help the development and application of AI technology.

06,6550

Large model evaluation # Large Model Review

C-Eval

The Chinese Basic Model Assessment Suite, jointly launched by Shanghai Jiao Tong University, Tsinghua University and the University of Edinburgh, covers objective questions assessed in multiple domains and difficulty levels, aiming to measure the ability of the Big Model in Chinese comprehension and reasoning.

07,1290

Large model evaluation # Model Evaluation

FlagEval

A comprehensive, scientific, and fair big model evaluation system and open platform aims to help researchers assess the performance of basic models and training algorithms in an all-round way by providing multi-dimensional evaluation tools and methods.

07,0010

Large model evaluation # Large Model Review