llmeval/LLMEval-Fair

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/llmeval/LLMEval-Fair)

llmeval / LLMEval-Fair

[ACL 2026] A large-scale longitudinal study on robust and fair evaluation of LLMs — 200K+ generative questions across 13 disciplines

☆40

Alternatives and similar repositories for LLMEval-Fair

Users that are interested in LLMEval-Fair are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

llmeval / LLMEval-2
View on GitHub
[AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines
☆71May 21, 2026Updated 2 months ago
llmeval / Llmeval-Gaokao2024-Math
View on GitHub
LLM evaluation on 2024 Chinese Gaokao Mathematics — zero-contamination benchmark with dual prompt formats
☆21Apr 15, 2026Updated 3 months ago
KongLongGeFDU / TransferTOD
View on GitHub
The code repository of paper "TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities"
☆20May 12, 2026Updated 2 months ago
llmeval / LLMEval-1
View on GitHub
[AAAI 2024] LLMEval Phase I dataset — 17 categories, 453 questions, 2186 annotators for Chinese LLM evaluation
☆114May 21, 2026Updated 2 months ago
tongjingqi / MathTrap
View on GitHub
In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a…
☆60Mar 15, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ssbuild / aigc_evals
View on GitHub
aigc evals
☆10Dec 2, 2023Updated 2 years ago
junkangwu / Dr_DPO
View on GitHub
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
☆19Jun 1, 2024Updated 2 years ago
aypan17 / reward-misspecification
View on GitHub
☆10Mar 13, 2023Updated 3 years ago
thu-coai / ComplexBench
View on GitHub
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆102Feb 20, 2025Updated last year
yzjiao / RolePred
View on GitHub
Source code for EMNLP findings paper "Open-Vocabulary Argument Role Prediction for Event Extraction"
☆19Nov 5, 2022Updated 3 years ago
lamps-lab / Patent-figure-segmentor
View on GitHub
☆14Aug 12, 2022Updated 3 years ago
Reza-esfandiarpoor / the-mcp-company
View on GitHub
TheMCPCompany: Creating General-purpose Agents with Task-specific Tools
☆16Dec 19, 2025Updated 7 months ago
BackupGithub-AI / LAH
View on GitHub
☆10Mar 28, 2023Updated 3 years ago
TARGET-SIDE-DATA-AUG / TSDASG
View on GitHub
Source Code for <Target-Side Data Augmentation for Sequence Generation>
☆12Oct 6, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ShuoTang123 / MATRIX-Gen
View on GitHub
☆47Oct 22, 2024Updated last year
SWE-Gym / SWE-Bench-Fork
View on GitHub
☆13Mar 5, 2025Updated last year
jiangshdd / ReviewCritique
View on GitHub
☆13Sep 26, 2024Updated last year
JieQin-AI / MGD-SSSS
View on GitHub
☆14Dec 14, 2023Updated 2 years ago
LINs-lab / M3
View on GitHub
[ICLR 2024] Towards Robust Multi-Modal Reasoning via Model Selection
☆14Mar 7, 2024Updated 2 years ago
CLUEbenchmark / SuperCLUE-Auto
View on GitHub
汽车行业中文大模型测评基准，基于多轮开放式问题的细粒度评测
☆39Dec 26, 2023Updated 2 years ago
thunlp / SE-Bench
View on GitHub
Official repo for "SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization"
☆28Mar 24, 2026Updated 4 months ago
RUCAIBox / HaluAgent
View on GitHub
☆23Jul 1, 2024Updated 2 years ago
OpenLMLab / scaling-rope
View on GitHub
code for Scaling Laws of RoPE-based Extrapolation
☆73Oct 16, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Mixture-AI / meta-llama-explain
View on GitHub
Explanation of the llama2 repo.
☆12Jul 18, 2024Updated 2 years ago
VickiCui / MORE
View on GitHub
Code release for "MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning"
☆11Oct 11, 2024Updated last year
JerryYin777 / Jerry_CV
View on GitHub
☆13Jan 21, 2024Updated 2 years ago
esantus / EVALution
View on GitHub
Dataset containing Semantic Relations and Metadata, for Training and Evaluating Distributional Semantic Models in English and Mandarin Ch…
☆16Aug 7, 2017Updated 8 years ago
AI4Patents / IMPACT
View on GitHub
IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)
☆18Jul 14, 2025Updated last year
pkunlp-icler / TSAR
View on GitHub
Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022
☆19May 1, 2022Updated 4 years ago
flageval-baai / FlagEval
View on GitHub
FlagEval is an evaluation toolkit for AI large foundation models.
☆338Apr 24, 2025Updated last year
connoryyan / hachimi-automaton
View on GitHub
一个基于 midi 文件自动生成哈基米音乐的工具
☆15Jun 5, 2026Updated last month
Samyu0304 / thought-propagation
View on GitHub
Code and dataset for the ICLR 2024 paper "Thought Propagation: An analogical Approach to Complex Reasoning with Large Language Models."
☆16Mar 4, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ritzz-ai / PACS
View on GitHub
☆31Sep 12, 2025Updated 10 months ago
ruc-datalab / SC-prompt
View on GitHub
☆12May 13, 2023Updated 3 years ago
selkerdawy / FTWT
View on GitHub
Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction
☆10May 25, 2022Updated 4 years ago
HKUST-KnowComp / AbductiveKGR
View on GitHub
[ACL 2024] Implementation for Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation
☆15Oct 9, 2025Updated 9 months ago
TIGER-AI-Lab / TheoremQA
View on GitHub
The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)
☆40May 15, 2024Updated 2 years ago
jdongca2003 / next_utterance_selection
View on GitHub
☆12May 7, 2018Updated 8 years ago
GavinKerrigan / conf_matrix_and_calibration
View on GitHub
☆12Aug 9, 2022Updated 3 years ago