lyt719 / LLM-evaluation-datasetsLinks
☆37Updated last year
Alternatives and similar repositories for LLM-evaluation-datasets
Users that are interested in LLM-evaluation-datasets are comparing it to the libraries listed below
Sorting:
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆63Updated last year
- Controllable Text Generation for Large Language Models: A Survey☆199Updated last year
- ☆182Updated last week
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆270Updated 6 months ago
- Neural Code Intelligence Survey 2024-25; Reading lists and resources☆280Updated 6 months ago
- A collection of survey papers and resources related to Large Language Models (LLMs).☆40Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆151Updated last year
- LLM hallucination paper list☆331Updated last year
- Awesome papers for role-playing with language models☆216Updated last year
- Large Language Models(LLMs) of Code☆20Updated 2 years ago
- The repository for the survey paper <<Survey on Large Language Models Factuality: Knowledge, Retrieval and Domain-Specificity>>☆341Updated last year
- [ACL 2024] A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future☆489Updated last year
- A collection for math word problem (MWP) works, including datasets, algorithms and so on.☆47Updated last year
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆258Updated 5 months ago
- This is the repository for the Tool Learning survey.☆476Updated 5 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆46Updated last year
- A live reading list for LLM data synthesis (Updated to July, 2025).☆446Updated 5 months ago
- An Awesome Collection for LLM Survey☆383Updated 8 months ago
- ☆54Updated 10 months ago
- Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation☆56Updated last year
- The awesome agents in the era of large language models☆71Updated 2 years ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆415Updated 7 months ago
- The related works and background techniques about Openai o1☆221Updated last year
- ☆216Updated last year
- [ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models☆39Updated last year
- ☆21Updated last year
- [COLM'24] Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration☆32Updated last year
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆136Updated last year
- Templates and examples for ACL and EMNLP conference posters.☆14Updated last year
- Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning☆168Updated last year