amazon-science / auto-rag-evalLinks
Code repo for the ICML 2024 paper "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation"
☆86Updated last year
Alternatives and similar repositories for auto-rag-eval
Users that are interested in auto-rag-eval are comparing it to the libraries listed below
Sorting:
- RefChecker provides automatic checking pipeline and benchmark dataset for detecting fine-grained hallucinations generated by Large Langua…☆417Updated 8 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆223Updated last year
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆168Updated 2 years ago
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆598Updated last year
- Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"☆408Updated 2 years ago
- This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"☆208Updated last month
- Comprehensive benchmark for RAG☆260Updated 7 months ago
- Automated Evaluation of RAG Systems☆687Updated 10 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆176Updated 2 weeks ago
- ☆236Updated 3 months ago
- Benchmarking library for RAG☆255Updated last week
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆136Updated last year
- ☆147Updated last year
- Banishing LLM Hallucinations Requires Rethinking Generalization☆277Updated last year
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆575Updated this week
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆116Updated 6 months ago
- ☆43Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆120Updated 3 months ago
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)☆386Updated 2 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆113Updated last year
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆196Updated 5 months ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆366Updated last year
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆446Updated last year
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆239Updated 4 months ago
- Sample notebooks and prompts for LLM evaluation☆159Updated 3 months ago
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆285Updated 2 years ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆319Updated last year
- Knowledge Graph Retrieval Augmented Generation (KG-RAG) Eval Datasets☆197Updated last year
- WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.☆62Updated last month
- Official repository for ORPO☆469Updated last year