amazon-science / auto-rag-eval
Code repo for the ICML 2024 paper "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation"
☆66Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for auto-rag-eval
- ☆112Updated last month
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆134Updated 10 months ago
- awesome synthetic (text) datasets☆242Updated 3 weeks ago
- ☆129Updated 3 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆78Updated 3 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆115Updated last month
- LLM Attributor: Attribute LLM's Generated Text to Training Data☆33Updated 5 months ago
- ☆22Updated last month
- ☆133Updated 4 months ago
- RefChecker provides automatic checking pipeline and benchmark dataset for detecting fine-grained hallucinations generated by Large Langua…☆303Updated 2 weeks ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆148Updated last month
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆104Updated last month
- [Preprint] Learning to Filter Context for Retrieval-Augmented Generaton☆183Updated 7 months ago
- Let's build better datasets, together!☆206Updated this week
- minimal pytorch implementation of bm25 (with sparse tensors)☆90Updated 8 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 3 weeks ago
- We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in …☆45Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆191Updated this week
- Code for Search-in-the-Chain: Towards Accurate, Credible and Traceable Large Language Models for Knowledge-intensive Tasks☆47Updated 7 months ago
- ☆126Updated 7 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆96Updated 7 months ago
- ☆48Updated 2 weeks ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆111Updated 8 months ago
- Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"☆92Updated 10 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆61Updated 4 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Repository for “PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers”, NAACL24☆126Updated 5 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆82Updated 2 months ago