lyt719 / LLM-evaluation-datasetsLinks

☆38

Alternatives and similar repositories for LLM-evaluation-datasets

Users that are interested in LLM-evaluation-datasets are comparing it to the libraries listed below

Sorting:

IAAR-Shanghai / CTGSurvey
Controllable Text Generation for Large Language Models: A Survey
☆199Updated last year
junzhuang-code / LLMSurveySummary
A collection of survey papers and resources related to Large Language Models (LLMs).
☆40Updated 2 years ago
thu-coai / SafetyBench
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆272Updated 6 months ago
AmourWaltz / Awesome-Reliable-LLM
☆186Updated 3 weeks ago
AI45Lab / Flames
Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.
☆63Updated last year
xsc1234 / INFO-RAG
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation
☆56Updated last year
pengr / LLM-Synthetic-Data
A live reading list for LLM data synthesis (Updated to July, 2025).
☆449Updated 5 months ago
nuochenpku / Awesome-Role-Play-Papers
Awesome papers for role-playing with language models
☆218Updated last year
zchuz / CoT-Reasoning-Survey
[ACL 2024] A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
☆490Updated last year
pillowsofwind / Knowledge-Conflicts-Survey
[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"
☆151Updated last year
quchangle1 / LLM-Tool-Survey
This is the repository for the Tool Learning survey.
☆478Updated 6 months ago
QiushiSun / Awesome-Code-Intelligence
Neural Code Intelligence Survey 2024-25; Reading lists and resources
☆280Updated 6 months ago
LuckyyySTA / Awesome-LLM-hallucination
LLM hallucination paper list
☆331Updated last year
xbmxb / RAG-query-rewriting
☆218Updated last year
CASIA-LM / MoDS
☆147Updated last year
thu-coai / AutoDetect
Official github repo for AutoDetect, an automated weakness detection framework for LLMs.
☆46Updated last year
NIL-zhuang / EfficientRAG-official
Code Repo for EfficientRAG: Efficient Retriever for Multi-Hop Question Answering
☆64Updated 11 months ago
wangcunxiang / LLM-Factuality-Survey
The repository for the survey paper <<Survey on Large Language Models Factuality: Knowledge, Retrieval and Domain-Specificity>>
☆341Updated last year
chen700564 / RGB
☆357Updated last year
tianyi-lab / Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆416Updated 7 months ago
shizhl / Multi-Agent-Papers
The awesome agents in the era of large language models
☆71Updated 2 years ago
Hongcheng-Gao / Awesome-Long2short-on-LRMs
Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…
☆258Updated 5 months ago
PolarisRisingWar / Math_Word_Problem_Collection
A collection for math word problem (MWP) works, including datasets, algorithms and so on.
☆47Updated last year
sugarandgugu / Simple-Trl-Training
基于DPO算法微调语言大模型，简单好上手。
☆50Updated last year
dongguanting / DPA-RAG
The code and data of DPA-RAG, accepted by WWW 2025 main conference.
☆63Updated 3 months ago
chenchen0103 / ACEBench
☆165Updated 3 months ago
RUCAIBox / HaluAgent
☆21Updated last year
hscspring / rl-llm-nlp
Reinforcement Learning in LLM and NLP.
☆62Updated last month
plageon / SlimPlm
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs (ACL 2024)
☆73Updated 9 months ago
wjn1996 / Awesome-LLM-Reasoning-Openai-o1-Survey
The related works and background techniques about Openai o1
☆220Updated last year