llmeval/LLMEval-2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/llmeval/LLMEval-2)

llmeval / LLMEval-2

[AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines

☆71

Alternatives and similar repositories for LLMEval-2

Users that are interested in LLMEval-2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

llmeval / LLMEval-1
View on GitHub
[AAAI 2024] LLMEval Phase I dataset — 17 categories, 453 questions, 2186 annotators for Chinese LLM evaluation
☆114May 21, 2026Updated 2 months ago
llmeval / LLMEval-Fair
View on GitHub
[ACL 2026] A large-scale longitudinal study on robust and fair evaluation of LLMs — 200K+ generative questions across 13 disciplines
☆40May 21, 2026Updated 2 months ago
KongLongGeFDU / TransferTOD
View on GitHub
The code repository of paper "TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities"
☆20May 12, 2026Updated 2 months ago
WooooDyy / LLM-Reverse-Curriculum-RL
View on GitHub
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆116Feb 9, 2024Updated 2 years ago
ssbuild / aigc_evals
View on GitHub
aigc evals
☆10Dec 2, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
CLUEbenchmark / SuperCLUE-Llama2-Chinese
View on GitHub
Llama2开源模型中文版-全方位测评，基于SuperCLUE的OPEN基准 | Llama2 Chinese evaluation with SuperCLUE
☆128Aug 2, 2023Updated 2 years ago
WooooDyy / BAPO
View on GitHub
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…
☆94Jan 29, 2026Updated 5 months ago
nilboy / reports
View on GitHub
文档记录
☆15Mar 16, 2021Updated 5 years ago
CLUEbenchmark / SuperCLUE-Open
View on GitHub
中文通用大模型开放域多轮测评基准 | An Open Domain Benchmark for Foundation Models in Chinese
☆81Aug 25, 2023Updated 2 years ago
ShuoTang123 / MATRIX-Gen
View on GitHub
☆47Oct 22, 2024Updated last year
FudanNLPLAB / CBook-150K
View on GitHub
中文图书语料MD5链接
☆217Jan 31, 2024Updated 2 years ago
OpenLMLab / ChatZoo
View on GitHub
Light local website for displaying performances from different chat models.
☆86Nov 13, 2023Updated 2 years ago
luxinyu1 / Chinese-LS
View on GitHub
A dataset and baselines for CLS.
☆13Sep 3, 2022Updated 3 years ago
FudanNLPLAB / MouSi
View on GitHub
☆75Mar 7, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
tongjingqi / MathTrap
View on GitHub
In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a…
☆60Mar 15, 2025Updated last year
tjunlp-lab / M3KE
View on GitHub
A Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark
☆106Jul 20, 2023Updated 3 years ago
uclnlp / EMAT
View on GitHub
Efficient Memory-Augmented Transformers
☆35Dec 5, 2022Updated 3 years ago
zepingyu0512 / arithmetic-mechanism
View on GitHub
code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
☆12Nov 17, 2024Updated last year
onejune2018 / Awesome-LLM-Eval
View on GitHub
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs…
☆653Nov 24, 2025Updated 8 months ago
llmeval / Llmeval-Gaokao2024-Math
View on GitHub
LLM evaluation on 2024 Chinese Gaokao Mathematics — zero-contamination benchmark with dual prompt formats
☆21Apr 15, 2026Updated 3 months ago
VIM-Bench / VIM_TOOL
View on GitHub
☆12Jun 12, 2024Updated 2 years ago
dqwang122 / MLROUGE
View on GitHub
ROUGE for multilingual Summarization
☆25Oct 11, 2021Updated 4 years ago
yzjiao / RolePred
View on GitHub
Source code for EMNLP findings paper "Open-Vocabulary Argument Role Prediction for Event Extraction"
☆19Nov 5, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OpenLMLab / MOSS_Vortex
View on GitHub
Moss Vortex is a lightweight and high-performance deployment and inference backend engineered specifically for MOSS 003, providing a weal…
☆37Apr 25, 2023Updated 3 years ago
SURF-ML / 2D-VQ-AE-2
View on GitHub
2D Vector-Quantized Auto-Encoder for compression of Whole-Slide Images in Histopathology
☆16Jul 18, 2024Updated 2 years ago
Felixgithub2017 / CG-Eval
View on GitHub
Chinese Generation Evaluation
☆13Aug 14, 2023Updated 2 years ago
lrs1353281004 / ChatGPT_recipes
View on GitHub
持续追踪ChatGPT相关的技术资料和行业进展。
☆11Apr 24, 2023Updated 3 years ago
OpenLMLab / scaling-rope
View on GitHub
code for Scaling Laws of RoPE-based Extrapolation
☆73Oct 16, 2023Updated 2 years ago
apartresearch / Integer_Addition
View on GitHub
✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks
☆19Aug 16, 2024Updated last year
dunzeng / MORE
View on GitHub
Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment
☆16Aug 6, 2024Updated last year
LLM-MI-Research / Actionable-MI
View on GitHub
☆15Jan 20, 2026Updated 6 months ago
THUKElab / LatEval
View on GitHub
☆10Mar 19, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
co0ontty / pocdb
View on GitHub
my poc
☆16Oct 28, 2020Updated 5 years ago
nex-agi / NexHTML
View on GitHub
HTML Agent based on NexAU
☆16Nov 20, 2025Updated 8 months ago
junkangwu / Dr_DPO
View on GitHub
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
☆19Jun 1, 2024Updated 2 years ago
KuaiSearchPERKS / PERKS
View on GitHub
KuaiSearch PERKS
☆12Nov 16, 2021Updated 4 years ago
MikeGu721 / XiezhiBenchmark
View on GitHub
☆98Dec 5, 2023Updated 2 years ago
xverse-ai / XVERSE-13B
View on GitHub
XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.
☆641Apr 9, 2024Updated 2 years ago
INK-USC / FiD-ICL
View on GitHub
"FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)
☆15Jul 24, 2023Updated 3 years ago