leobeeson / llm_benchmarksLinks
A collection of benchmarks and datasets for evaluating LLM.
โ474Updated last year
Alternatives and similar repositories for llm_benchmarks
Users that are interested in llm_benchmarks are comparing it to the libraries listed below
Sorting:
- The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.โ776Updated last year
- Chat Templates for ๐ค HuggingFace Large Language Modelsโ684Updated 7 months ago
- List of papers on hallucination detection in LLMs.โ916Updated last month
- A reading list on LLM based Synthetic Data Generation ๐ฅโ1,338Updated last month
- โ382Updated last month
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data โฆโ730Updated 4 months ago
- The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]โ259Updated 4 months ago
- Automatic evals for LLMsโ467Updated 3 weeks ago
- A curated list of retrieval-augmented generation (RAG) in large language modelsโ285Updated 5 months ago
- Awesome-LLM-Prompt-Optimization: a curated list of advanced prompt optimization and tuning methods in Large Language Modelsโ352Updated last year
- RewardBench: the first evaluation tool for reward models.โ612Updated last month
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Modelsโ543Updated last year
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsโ1,722Updated last week
- Official repository for ORPOโ458Updated last year
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.โ1,800Updated 6 months ago
- Aligning Large Language Models with Human: A Surveyโ729Updated last year
- A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt ๆถๅฝๅ็งๅๆ ท็ๆไปคๆฐๆฎ้, ็จไบ่ฎญ็ป ChatLLM ๆจกๅใโ685Updated last year
- A collection of 150+ surveys on LLMsโ316Updated 4 months ago
- This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.โ536Updated 8 months ago
- โ840Updated 2 weeks ago
- This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.โ488Updated last year
- โ582Updated last month
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"โ503Updated 6 months ago
- ๐ฐ Must-read papers and blogs on LLM based Long Context Modeling ๐ฅโ1,599Updated this week
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).โ868Updated 3 weeks ago
- [ICML 2024] TrustLLM: Trustworthiness in Large Language Modelsโ581Updated 3 weeks ago
- Evaluate your LLM's response with Prometheus and GPT4 ๐ฏโ963Updated 2 months ago
- โ547Updated last year
- Doing simple retrieval from LLM models at various context lengths to measure accuracyโ1,934Updated 11 months ago
- โ949Updated 5 months ago