THUDM/AlignBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/THUDM/AlignBench)

THUDM / AlignBench

大模型多维度中文对齐评测基准 (ACL 2024)

☆428

Alternatives and similar repositories for AlignBench

Users that are interested in AlignBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thu-coai / CritiqueLLM
View on GitHub
☆148Jul 1, 2024Updated last year
thu-coai / ComplexBench
View on GitHub
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆103Feb 20, 2025Updated last year
open-compass / opencompass
View on GitHub
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …
☆7,003Updated this week
flageval-baai / FlagEval
View on GitHub
FlagEval is an evaluation toolkit for AI large foundation models.
☆337Apr 24, 2025Updated last year
thu-coai / Safety-Prompts
View on GitHub
Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts，用于评估和提升大模型的安全性。
☆1,158Feb 27, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
YJiangcm / FollowBench
View on GitHub
[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
☆118Jun 12, 2025Updated 11 months ago
onejune2018 / Awesome-LLM-Eval
View on GitHub
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs…
☆636Nov 24, 2025Updated 5 months ago
meowpass / FollowComplexInstruction
View on GitHub
Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…
☆53Jun 24, 2024Updated last year
OpenMOSS / HalluQA
View on GitHub
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
☆139Jun 5, 2024Updated last year
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,523Updated this week
open-compass / MathBench
View on GitHub
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
☆114May 22, 2025Updated 11 months ago
GAIR-NLP / auto-j
View on GitHub
Generative Judge for Evaluating Alignment
☆250Jan 18, 2024Updated 2 years ago
morecry / CharacterEval
View on GitHub
☆293May 27, 2025Updated 11 months ago
icip-cas / awesome-auto-alignment
View on GitHub
Collection of papers for scalable automated alignment.
☆93Oct 22, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
THUDM / LongBench
View on GitHub
LongBench v2 and LongBench (ACL 25'&24')
☆1,169Jan 15, 2025Updated last year
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,418Feb 8, 2026Updated 3 months ago
OFA-Sys / InsTag
View on GitHub
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆285Aug 20, 2023Updated 2 years ago
X-PLUG / CValues
View on GitHub
面向中文大模型价值观的评估与对齐研究
☆556Jul 20, 2023Updated 2 years ago
princeton-nlp / LLMBar
View on GitHub
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆138Jul 8, 2024Updated last year
GAIR-NLP / O1-Journey
View on GitHub
O1 Replication Journey
☆2,000Jan 14, 2025Updated last year
PKU-Alignment / safe-rlhf
View on GitHub
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
☆1,601Nov 24, 2025Updated 5 months ago
LianjiaTech / BELLE
View on GitHub
BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）
☆8,282Oct 16, 2024Updated last year
hkust-nlp / ceval
View on GitHub
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
☆1,849Jul 27, 2025Updated 9 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
tianyi-lab / Cherry_LLM
View on GitHub
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆416Jun 25, 2025Updated 10 months ago
NVIDIA / NeMo-Aligner
View on GitHub
Scalable toolkit for efficient model alignment
☆854Oct 6, 2025Updated 7 months ago
thu-coai / SafetyBench
View on GitHub
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆287Jul 28, 2025Updated 9 months ago
lmarena / arena-hard-auto
View on GitHub
Arena-Hard-Auto: An automatic LLM benchmark.
☆1,020Jun 21, 2025Updated 11 months ago
CLUEbenchmark / SuperCLUE
View on GitHub
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
☆3,286Feb 6, 2026Updated 3 months ago
multimodal-art-projection / MAP-NEO
View on GitHub
☆987Feb 7, 2025Updated last year
THUDM / ChatGLM-Math
View on GitHub
☆83Apr 18, 2024Updated 2 years ago
qinyiwei / InfoBench
View on GitHub
☆59Aug 22, 2024Updated last year
QwenLM / AutoIF
View on GitHub
☆332Jul 25, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
bigscience-workshop / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,440Mar 20, 2024Updated 2 years ago
haonan-li / CMMLU
View on GitHub
CMMLU: Measuring massive multitask language understanding in Chinese
☆814Dec 6, 2024Updated last year
hkust-nlp / deita
View on GitHub
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆596Dec 9, 2024Updated last year
GAIR-NLP / abel
View on GitHub
SOTA Math Opensource LLM
☆336Dec 12, 2023Updated 2 years ago
THUDM / LongAlign
View on GitHub
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
☆260Dec 16, 2024Updated last year
OpenLMLab / LEval
View on GitHub
[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
☆403Jul 9, 2024Updated last year
openreasoner / openr
View on GitHub
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
☆1,842Jan 17, 2025Updated last year