leobeeson / llm_benchmarksLinks
A collection of benchmarks and datasets for evaluating LLM.
☆454Updated 10 months ago
Alternatives and similar repositories for llm_benchmarks
Users that are interested in llm_benchmarks are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆708Updated 2 months ago
- ☆349Updated this week
- A collection of 150+ surveys on LLMs☆301Updated 3 months ago
- LongBench v2 and LongBench (ACL 2024)☆888Updated 4 months ago
- Automatic evals for LLMs☆407Updated this week
- ☆744Updated last month
- This is an implementation of the paper: Searching for Best Practices in Retrieval-Augmented Generation (EMNLP2024)☆320Updated 5 months ago
- RewardBench: the first evaluation tool for reward models.☆590Updated this week
- The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]☆247Updated 3 months ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆353Updated 9 months ago
- Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.☆383Updated last year
- Official repository for ORPO☆453Updated last year
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆351Updated last month
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆547Updated last year
- Summarize existing representative LLMs text datasets.☆1,278Updated 2 months ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆322Updated last year
- A curated list of retrieval-augmented generation (RAG) in large language models☆276Updated 3 months ago
- The official evaluation suite and dynamic data release for MixEval.☆242Updated 6 months ago
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation☆299Updated 7 months ago
- A reading list on LLM based Synthetic Data Generation 🔥☆1,287Updated 2 weeks ago
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆231Updated 9 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,574Updated last week
- A curated list of awesome instruction tuning datasets, models, papers and repositories.☆335Updated last year
- awesome synthetic (text) datasets☆281Updated 7 months ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆555Updated 5 months ago
- ☆539Updated last year
- An Open Source Toolkit For LLM Distillation☆618Updated this week
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆493Updated 4 months ago
- Repository for "MultiHop-RAG: A Dataset for Evaluating Retrieval-Augmented Generation Across Documents" (COLM 2024)☆324Updated 2 months ago
- Collection of training data management explorations for large language models☆325Updated 10 months ago