myscale / Retrieval-QA-Benchmark
Benchmark baseline for retrieval qa applications
☆104Updated 11 months ago
Alternatives and similar repositories for Retrieval-QA-Benchmark:
Users that are interested in Retrieval-QA-Benchmark are comparing it to the libraries listed below
- This is the code repo for our paper "Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents".☆104Updated 5 months ago
- [Preprint] Learning to Filter Context for Retrieval-Augmented Generaton☆190Updated 11 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆131Updated 4 months ago
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆133Updated 3 months ago
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆152Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆136Updated 4 months ago
- RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation.☆122Updated 8 months ago
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆151Updated last year
- Generative Judge for Evaluating Alignment☆230Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆254Updated last year
- Repository for Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, ACL23☆195Updated 9 months ago
- Codebase accompanying the Summary of a Haystack paper.☆75Updated 6 months ago
- "Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases" b…☆42Updated last year
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆109Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆150Updated last year
- ☆138Updated last month
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆179Updated 5 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆104Updated 6 months ago
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"☆65Updated 7 months ago
- Comprehensive benchmark for RAG☆144Updated 4 months ago
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆200Updated 4 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆229Updated last month
- Reformatted Alignment☆115Updated 6 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆116Updated 4 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆94Updated last year
- Code implementation of synthetic continued pretraining☆95Updated 2 months ago
- ☆277Updated last year
- Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.☆147Updated last year
- ☆142Updated 11 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆160Updated 3 months ago