onejune2018 / Awesome-LLM-EvalLinks

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表，主要面向基础大模型评测，旨在探求生成式AI的技术边界.

☆574

Alternatives and similar repositories for Awesome-LLM-Eval

Users that are interested in Awesome-LLM-Eval are comparing it to the libraries listed below

Sorting:

THUDM / AlignBench
大模型多维度中文对齐评测基准 (ACL 2024)
☆414Updated last year
Wang-Shuo / A-Guide-to-Retrieval-Augmented-LLM
an intro to retrieval augmented large language model
☆301Updated 2 years ago
wgwang / awesome-LLM-benchmarks
Awesome LLM Benchmarks to evaluate the LLMs across text, code, image, audio, video and more.
☆151Updated last year
ZigeW / data_management_LLM
Collection of training data management explorations for large language models
☆334Updated last year
OpenBMB / BMPrinciples
A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or l…
☆284Updated 2 years ago
X-PLUG / CValues
面向中文大模型价值观的评估与对齐研究
☆540Updated 2 years ago
thunlp / ToolLearningPapers
☆908Updated last year
pengr / LLM-Synthetic-Data
A live reading list for LLM data synthesis (Updated to July, 2025).
☆398Updated 2 months ago
HqWu-HITCS / Awesome-LLM-Survey
An Awesome Collection for LLM Survey
☆377Updated 5 months ago
THUDM / LongBench
LongBench v2 and LongBench (ACL 25'&24')
☆997Updated 9 months ago
GPT-Fathom / GPT-Fathom
GPT-Fathom is an open-source and reproducible LLM evaluation suite, benchmarking 10+ leading open-source and closed-source LLMs as well a…
☆347Updated last year
OpenBMB / UltraEval
[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.
☆251Updated 11 months ago
thu-coai / BPO
☆328Updated last year
WeOpenML / PandaLM
☆922Updated last year
flageval-baai / FlagEval
FlagEval is an evaluation toolkit for AI large foundation models.
☆338Updated 6 months ago
OpenLMLab / GAOKAO-Bench
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.
☆691Updated 9 months ago
tianyi-lab / Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆398Updated 4 months ago
multimodal-art-projection / MAP-NEO
☆964Updated 8 months ago
IAAR-Shanghai / CRUD_RAG
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
☆337Updated 5 months ago
chaoswork / sft_datasets
开源SFT数据集整理,随时补充
☆547Updated 2 years ago
quchangle1 / LLM-Tool-Survey
This is the repository for the Tool Learning survey.
☆449Updated 2 months ago
hkust-nlp / deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆572Updated 10 months ago
RUC-GSAI / YuLan-Chat
YuLan: An Open-Source Large Language Model
☆633Updated 9 months ago
open-compass / T-Eval
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
☆294Updated last year
LLaMafia / llamafia.github
☆321Updated last year
Glanvery / LLM-Travel
欢迎来到 "LLM-travel" 仓库！探索大语言模型（LLM）的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。
☆347Updated last year
jianzhnie / LLamaTuner
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
☆616Updated 9 months ago
haonan-li / CMMLU
CMMLU: Measuring massive multitask language understanding in Chinese
☆790Updated 10 months ago
yongzhuo / LLM-SFT
中文大模型微调(LLM-SFT), 数学指令数据集MWP-Instruct, 支持模型(ChatGLM-6B, LLaMA, Bloom-7B, baichuan-7B), 支持(LoRA, QLoRA, DeepSpeed, UI, TensorboardX), 支持(微…
☆211Updated last year
SUFE-AIFLM-Lab / FinEval
FinEval是一个中文金融领域高质量多项选择与文本问答题的集合。
☆233Updated 4 months ago