wgwang/awesome-LLM-benchmarks

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wgwang/awesome-LLM-benchmarks)

wgwang / awesome-LLM-benchmarks

Awesome LLM Benchmarks to evaluate the LLMs across text, code, image, audio, video and more.

☆167

Alternatives and similar repositories for awesome-LLM-benchmarks

Users that are interested in awesome-LLM-benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

onejune2018 / Awesome-LLM-Eval
View on GitHub
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs…
☆654Nov 24, 2025Updated 8 months ago
wgwang / awesome-LLMs-In-China
View on GitHub
中国大模型
☆6,460Nov 30, 2024Updated last year
kevinyaobytedance / llm_eval
View on GitHub
LLM evaluation.
☆16Nov 7, 2023Updated 2 years ago
richard-peng-xia / KD-CGEC
View on GitHub
Code for Chinese grammatical error correction based on knowledge distillation
☆11Aug 16, 2022Updated 3 years ago
CLUEbenchmark / SuperCLUE-Auto
View on GitHub
汽车行业中文大模型测评基准，基于多轮开放式问题的细粒度评测
☆39Dec 26, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
letsgoLakers / NCIFD
View on GitHub
面向大模型的民族文化数据集
☆13May 26, 2025Updated last year
pyRis / SEntFiN
View on GitHub
Dataset and codes for SEntFiN
☆10May 31, 2023Updated 3 years ago
Shrinidhi1 / Multiclass-Semantic-Segmentation-for-Road-Surface-Detection
View on GitHub
Identification of road surfaces and 12 different classes like speed bumps, paved, unpaved, markings, water puddles, potholes, etc.
☆16Jul 21, 2023Updated 3 years ago
GAIR-NLP / scaleeval
View on GitHub
Scalable Meta-Evaluation of LLMs as Evaluators
☆43Feb 15, 2024Updated 2 years ago
terryyz / llm-benchmark
View on GitHub
A list of LLM benchmark frameworks.
☆75Feb 17, 2024Updated 2 years ago
tried42long / Comic-Colorization-with-cGAN
View on GitHub
Comic Colorization with cGAN
☆14Dec 19, 2018Updated 7 years ago
xiongma / DGCNN
View on GitHub
Dilation Gate CNN For Machine Reading Comprehension
☆17Mar 24, 2023Updated 3 years ago
shamilcm / m2scorer
View on GitHub
Scorer for grammatical error correction systems.
☆14Feb 24, 2016Updated 10 years ago
sb-jang / kodialogbench
View on GitHub
Code and data for "KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark" (LREC-COLING…
☆18Apr 15, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
duiying / Knower
View on GitHub
Knower（知者）：一个实用的的开源知识库管理平台。基于 Hyperf 实现，集成了权限管理、第三方登录（GitHub、QQ）、企业微信自建应用通知等功能，亦可作为 Hyperf 的开发脚手架。
☆12Jan 27, 2022Updated 4 years ago
thu-coai / SafetyBench
View on GitHub
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆296Jul 28, 2025Updated last year
reilxlx / llava-Qwen2-7B-Instruct-Chinese-CLIP
View on GitHub
模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力，接近gpt4o、claude-3.5-sonnet的识别水平！
☆28Jul 23, 2024Updated 2 years ago
CLUEbenchmark / SuperCLUE
View on GitHub
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
☆3,296Feb 6, 2026Updated 5 months ago
ArtificialZeng / Qwen-Tuning
View on GitHub
Qwen-Efficient-Tuning
☆44Aug 16, 2023Updated 2 years ago
askolik / eqc_for_nco
View on GitHub
Neural combinatorial optimization with equivariant quantum circuits.
☆12May 13, 2022Updated 4 years ago
ispamm / FairDrop
View on GitHub
☆14Jul 22, 2026Updated last week
aotumanbiu / OC-NN
View on GitHub
One-Class Convolutional Neural Network pytorch实现，后续还会继续优化！！！！
☆13Oct 27, 2022Updated 3 years ago
rgu-iit-bt / cbr-for-legal-rag
View on GitHub
☆20Feb 20, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
flageval-baai / FlagEval
View on GitHub
FlagEval is an evaluation toolkit for AI large foundation models.
☆338Apr 24, 2025Updated last year
rciszek / mdr_tcca
View on GitHub
Multi-view dimensionality reduction using tensor canonical correlation analysis
☆10Jun 28, 2018Updated 8 years ago
PetroGPT / PetroGPT
View on GitHub
石油领域大语言模型
☆18Feb 22, 2024Updated 2 years ago
brunocapelao / miniAutoGen
View on GitHub
Lightweight and Flexible Library for Creating Agents and Multi-Agent Conversations 🤖
☆30May 17, 2026Updated 2 months ago
onlookerliu / matrix_theory
View on GitHub
☆16Dec 26, 2017Updated 8 years ago
OpenDFM / MULTI-Benchmark
View on GitHub
[SCIS] MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images
☆47Jul 15, 2026Updated 2 weeks ago
violet-zct / pytorch_NMT
View on GitHub
pytorch attentional NMT(with NLL, MRT, REINFORCE, MIXER training objectives)
☆13May 12, 2017Updated 9 years ago
BOHRTECHNOLOGY / public_research
View on GitHub
Publicly available research done by BOHR.TECHNOLOGY.
☆17Dec 8, 2022Updated 3 years ago
open-compass / opencompass
View on GitHub
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …
☆7,246Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
casmlab / NPHardEval
View on GitHub
Repository for NPHardEval, a quantified-dynamic benchmark of LLMs
☆64Mar 26, 2024Updated 2 years ago
mynewstart / Tianchi-Multi-Task-Learning
View on GitHub
第一名克莱登大学二队方案分享
☆18Mar 5, 2021Updated 5 years ago
soarsmu / VulMaster_
View on GitHub
☆19May 27, 2025Updated last year
gmcmt / graph_prompt_extension
View on GitHub
☆19Dec 12, 2023Updated 2 years ago
llvqi / multiview_and_self-supervision
View on GitHub
multiview and self-supervised learning
☆11May 8, 2022Updated 4 years ago
Zebrocode / rpc
View on GitHub
☆12Sep 16, 2020Updated 5 years ago
leftthomas / ClipPrompt
View on GitHub
A PyTorch implementation of ClipPrompt based on CVPR 2023 paper "CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained…
☆18Nov 5, 2023Updated 2 years ago