lyt719 / LLM-evaluation-datasetsLinks
☆32Updated last year
Alternatives and similar repositories for LLM-evaluation-datasets
Users that are interested in LLM-evaluation-datasets are comparing it to the libraries listed below
Sorting:
- Controllable Text Generation for Large Language Models: A Survey☆192Updated last year
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆260Updated 3 months ago
- A collection of survey papers and resources related to Large Language Models (LLMs).☆40Updated last year
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆60Updated last year
- ☆167Updated last year
- A collection for math word problem (MWP) works, including datasets, algorithms and so on.☆44Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆144Updated last year
- ☆146Updated last year
- Code Repo for EfficientRAG: Efficient Retriever for Multi-Hop Question Answering☆60Updated 8 months ago
- Templates and examples for ACL and EMNLP conference posters.☆14Updated last year
- ☆21Updated last year
- Awesome papers for role-playing with language models☆208Updated last year
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆91Updated 5 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆44Updated last year
- [ACL 2024] A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future☆468Updated 9 months ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆400Updated 4 months ago
- Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"☆135Updated last year
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆214Updated last year
- ☆345Updated last year
- S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models☆102Updated 3 weeks ago
- Neural Code Intelligence Survey 2024; Reading lists and resources☆275Updated 3 months ago
- A live reading list for LLM data synthesis (Updated to July, 2025).☆399Updated 2 months ago
- 对llama3进行全参微调、lora微调以及qlora微调。☆210Updated last year
- LLM hallucination paper list☆323Updated last year
- A curated list of awesome works in Routing LLMs paradigm (👉 Welcome to submit your contributions to this code repository)☆70Updated 3 weeks ago
- ☆46Updated 7 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆251Updated last year
- ☆85Updated last year
- 基于DPO算法微调语言大模型,简单好上手。☆46Updated last year
- ☆210Updated last year