amazon-science / auto-rag-evalLinks
Code repo for the ICML 2024 paper "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation"
☆80Updated last year
Alternatives and similar repositories for auto-rag-eval
Users that are interested in auto-rag-eval are comparing it to the libraries listed below
Sorting:
- RefChecker provides automatic checking pipeline and benchmark dataset for detecting fine-grained hallucinations generated by Large Langua…☆374Updated last month
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆157Updated last year
- ☆144Updated 11 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆186Updated 6 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆107Updated last year
- ☆39Updated 11 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 9 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆112Updated 9 months ago
- ☆45Updated 10 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆130Updated last year
- Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"☆135Updated last year
- A generative AI-powered framework for testing virtual agents.☆254Updated 2 months ago
- ☆44Updated 7 months ago
- ☆178Updated 10 months ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆111Updated 9 months ago
- This is the repo for the LegalBench-RAG Paper: https://arxiv.org/abs/2408.10343.☆97Updated 3 weeks ago
- ☆75Updated 5 months ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆472Updated last week
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆235Updated 10 months ago
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.☆188Updated 10 months ago
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆144Updated 6 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆425Updated last year
- Benchmarking library for RAG☆209Updated 2 weeks ago
- The official repository for the paper: Evaluation of Retrieval-Augmented Generation: A Survey.☆160Updated 2 months ago
- Sample notebooks and prompts for LLM evaluation☆135Updated 2 weeks ago
- Repository for "MultiHop-RAG: A Dataset for Evaluating Retrieval-Augmented Generation Across Documents" (COLM 2024)☆330Updated 2 months ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆213Updated 3 weeks ago
- Benchmark baseline for retrieval qa applications☆115Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆112Updated 2 weeks ago
- Comprehensive benchmark for RAG☆194Updated last week