lyt719 / LLM-evaluation-datasetsLinks
☆29Updated last year
Alternatives and similar repositories for LLM-evaluation-datasets
Users that are interested in LLM-evaluation-datasets are comparing it to the libraries listed below
Sorting:
- Controllable Text Generation for Large Language Models: A Survey☆181Updated 10 months ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆228Updated last year
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆56Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆127Updated 9 months ago
- ☆141Updated 10 months ago
- A collection of survey papers and resources related to Large Language Models (LLMs).☆40Updated last year
- The repository for the survey paper <<Survey on Large Language Models Factuality: Knowledge, Retrieval and Domain-Specificity>>☆340Updated last year
- [ACL 2024] A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future☆454Updated 6 months ago
- Large Language Models(LLMs) of Code☆18Updated 2 years ago
- Awesome papers for role-playing with language models☆194Updated 8 months ago
- A collection for math word problem (MWP) works, including datasets, algorithms and so on.☆44Updated last year
- Neural Code Intelligence Survey 2024; Reading lists and resources☆265Updated 3 weeks ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆42Updated last year
- LLM hallucination paper list☆319Updated last year
- ☆142Updated last year
- A live reading list for LLM-synthetic-data.☆308Updated last week
- An up-to-date curated list of Retrieval-Augmented Generation (RAG) for LLMs.☆115Updated last month
- ☆324Updated last year
- S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models☆73Updated 2 weeks ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆130Updated last year
- Official repository for the paper "COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis".