IBM / mt-rag-benchmarkLinks
Multi-Turn RAG Benchmark
☆76Updated this week
Alternatives and similar repositories for mt-rag-benchmark
Users that are interested in mt-rag-benchmark are comparing it to the libraries listed below
Sorting:
- Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition, TACL 2022☆163Updated last year
- [ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆167Updated this week
- ☆186Updated 2 months ago
- Code for the ACL 2023 long paper - Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering☆37Updated 2 years ago
- ☆121Updated 2 years ago
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)☆54Updated 2 months ago
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆52Updated 3 months ago
- ☆41Updated 7 months ago
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆122Updated 7 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆198Updated 9 months ago
- [Neurips2023] Source code for Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory☆62Updated 2 years ago
- 🌲 Code for our EMNLP 2023 paper - 🎄 "Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Mode…☆51Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆133Updated last year
- RARR: Researching and Revising What Language Models Say, Using Language Models☆48Updated 2 years ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆119Updated last year
- [NAACL 2024] End-to-End Beam Retrieval for Multi-Hop Question Answering☆113Updated last year
- Token-level Reference-free Hallucination Detection☆96Updated 2 years ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆138Updated last year
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆105Updated last year
- [NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"☆113Updated 2 years ago
- Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"☆56Updated 2 years ago
- TAT-QA (Tabular And Textual dataset for Question Answering) contains 16,552 questions associated with 2,757 hybrid contexts from real-wor…☆117Updated 9 months ago
- The code and data for paper "Large Language Models are few(1)-shot Table Reasoners" [EACL2023]☆47Updated last year
- ☆78Updated last year
- [ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.☆94Updated 5 months ago
- First explanation metric (diagnostic report) for text generation evaluation☆62Updated 6 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆41Updated 2 years ago
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆101Updated 2 years ago
- Test-time compute in information retrieval☆42Updated 2 months ago
- ☆74Updated last year