llmeval/LLMEval-1

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/llmeval/LLMEval-1)

llmeval / LLMEval-1

[AAAI 2024] LLMEval Phase I dataset — 17 categories, 453 questions, 2186 annotators for Chinese LLM evaluation

☆114

Alternatives and similar repositories for LLMEval-1

Users that are interested in LLMEval-1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

llmeval / LLMEval-2
View on GitHub
[AAAI 2024] LLMEval Phase II dataset — professional domain evaluation across 12 academic disciplines
☆71May 21, 2026Updated 2 months ago
llmeval / LLMEval-Fair
View on GitHub
[ACL 2026] A large-scale longitudinal study on robust and fair evaluation of LLMs — 200K+ generative questions across 13 disciplines
☆40May 21, 2026Updated 2 months ago
i-Eval / FairEval
View on GitHub
☆145Sep 10, 2023Updated 2 years ago
WooooDyy / BAPO
View on GitHub
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…
☆94Jan 29, 2026Updated 5 months ago
nuaa-nlp / Evaluation-of-ChatGPT
View on GitHub
☆14Apr 15, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
WooooDyy / LLM-Reverse-Curriculum-RL
View on GitHub
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆116Feb 9, 2024Updated 2 years ago
ExpressAI / AI-Gaokao
View on GitHub
Gaokao Benchmark for AI
☆109Jul 8, 2022Updated 4 years ago
thu-coai / CritiqueLLM
View on GitHub
☆147Jul 1, 2024Updated 2 years ago
OpenLMLab / OpenChineseLLaMA
View on GitHub
Chinese large language model base generated through incremental pre-training on Chinese datasets
☆239May 30, 2023Updated 3 years ago
WeOpenML / PandaLM
View on GitHub
☆926May 22, 2024Updated 2 years ago
intro-nlp / intro-nlp.github.io
View on GitHub
《自然语言处理概论》张奇、桂韬、黄萱菁著
☆122Sep 10, 2023Updated 2 years ago
ruixiangcui / AGIEval
View on GitHub
☆774Jun 13, 2024Updated 2 years ago
xiami2019 / CLAIF
View on GitHub
[Findings of ACL'2023] Improving Contrastive Learning of Sentence Embeddings from AI Feedback
☆40Aug 14, 2023Updated 2 years ago
OpenLMLab / LongWanjuan
View on GitHub
Towards Systematic Measurement for Long Text Quality
☆39Sep 5, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
princeton-nlp / LLMBar
View on GitHub
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆138Jul 8, 2024Updated 2 years ago
OpenLMLab / ChatZoo
View on GitHub
Light local website for displaying performances from different chat models.
☆86Nov 13, 2023Updated 2 years ago
Blue-Raincoat / SelectIT
View on GitHub
☆24Oct 14, 2024Updated last year
ExpressAI / reStructured-Pretraining
View on GitHub
reStructured Pre-training
☆99Dec 22, 2022Updated 3 years ago
THU-KEG / KoLA
View on GitHub
[ICLR24] The open-source repo of THU-KEG's KoLA benchmark.
☆57Sep 28, 2023Updated 2 years ago
FudanNLPLAB / CBook-150K
View on GitHub
中文图书语料MD5链接
☆217Jan 31, 2024Updated 2 years ago
OFA-Sys / ExpertLLaMA
View on GitHub
An opensource ChatBot built with ExpertPrompting which achieves 96% of ChatGPT's capability.
☆298May 31, 2023Updated 3 years ago
artpli / CodeIE
View on GitHub
[ACL 23] CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors
☆42Dec 14, 2025Updated 7 months ago
MoFHeka / LLaMA-Megatron
View on GitHub
A LLaMA1/LLaMA12 Megatron implement.
☆28Dec 13, 2023Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Judenpech / MLEC-QA
View on GitHub
Data and baseline code of EMNLP 2021 paper "MLEC-QA: A Chinese Multi-Choice Biomedical Question Answering Dataset".
☆32Nov 5, 2021Updated 4 years ago
CriticBench / CriticBench
View on GitHub
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆31Mar 5, 2024Updated 2 years ago
YJiangcm / FollowBench
View on GitHub
[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
☆118Jun 12, 2025Updated last year
OFA-Sys / OFA-Compress
View on GitHub
OFA-Compress is a unified framework which provides OFA model finetuning, distillation and inference capabilities in Huggingface version, …
☆29Sep 22, 2022Updated 3 years ago
yhcc / utcie
View on GitHub
This is the code repo for the paper <UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction>
☆15Aug 10, 2023Updated 2 years ago
yegcjs / mixinglaws
View on GitHub
☆113Jul 15, 2025Updated last year
declare-lab / instruct-eval
View on GitHub
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
☆552Mar 10, 2024Updated 2 years ago
acl-org / emnlp-2023
View on GitHub
Repository containing the website for the EMNLP 2023 conference
☆17Feb 12, 2025Updated last year
OpenLMLab / Sniffer
View on GitHub
☆27Jun 5, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Neutralzz / BiLLa
View on GitHub
BiLLa: A Bilingual LLaMA with Enhanced Reasoning Ability
☆415Jun 1, 2023Updated 3 years ago
open-compass / CriticEval
View on GitHub
[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs
☆49Nov 29, 2024Updated last year
howl-anderson / MicroWeatherBot_CN
View on GitHub
基于 rasa 1.x 版本搭建的中文天气查询 demo | A simple & micro Chinese Weatherbot based on rasa framework
☆12Aug 14, 2019Updated 6 years ago
choosewhatulike / cluster-clip
View on GitHub
Multi-GPU supported kmeans clustering for cluser-clip
☆15Jun 3, 2024Updated 2 years ago
wzhouad / WPO
View on GitHub
Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"
☆41Sep 24, 2024Updated last year
tengxiaoliu / XoT
View on GitHub
[EMNLP 2023] Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts
☆27Nov 4, 2023Updated 2 years ago
hengyicai / ContrastiveLearning4Dialogue
View on GitHub
The codebase for "Group-wise Contrastive Learning for Neural Dialogue Generation" (Cai et al., Findings of EMNLP 2020)
☆55Feb 24, 2021Updated 5 years ago