☆52Aug 14, 2024Updated last year
Alternatives and similar repositories for rag-qa-arena
Users that are interested in rag-qa-arena are comparing it to the libraries listed below
Sorting:
- ☆20Mar 22, 2024Updated last year
- ☆15Feb 21, 2024Updated 2 years ago
- ☆26Nov 7, 2022Updated 3 years ago
- [KDD24-ADS] R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models☆11Apr 9, 2024Updated last year
- [ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs☆17May 21, 2025Updated 9 months ago
- Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…☆28Nov 2, 2023Updated 2 years ago
- [ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents☆35Updated this week
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- ☆21Jun 12, 2024Updated last year
- ☆33Jul 9, 2025Updated 8 months ago
- ☆17Jul 5, 2022Updated 3 years ago
- [ICLR 2025] Large (Vision) Language Models are Unsupervised In-Context Learners☆22Jun 6, 2025Updated 9 months ago
- A holistic benchmark for LLM abstention☆73Aug 27, 2025Updated 6 months ago
- ☆45Jan 21, 2025Updated last year
- Visualize constituent and dependency parses as PDF or image formats, through GraphViz.☆32Feb 11, 2021Updated 5 years ago
- ☆16Sep 9, 2023Updated 2 years ago
- ☆17May 28, 2024Updated last year
- Source code for the EMNLP 2020 long paper <Token-level Adaptive Training for Neural Machine Translation>.☆20Oct 28, 2022Updated 3 years ago
- ☆31Sep 12, 2025Updated 5 months ago
- Reproducible Language Agent Research☆34Jun 25, 2025Updated 8 months ago
- Merging Generated and Retrieved Knowledge for Open-Domain QA (EMNLP 2023)☆22Oct 8, 2023Updated 2 years ago
- Comprehensive benchmark for RAG☆266Jun 14, 2025Updated 8 months ago
- ☆27Mar 21, 2024Updated last year
- This is the official repo for the paper "AMO-Bench: Large Language Models Still Struggle in High School Math Competitions".☆64Feb 6, 2026Updated last month
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- State-of-the-art architecture for Plant Disease Detection using Deep Learning.☆10Jul 4, 2022Updated 3 years ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- ☆28May 23, 2024Updated last year
- YiSi: A Semantic Machine Translation Evaluation Metric for Evaluating Languages with Different Levels of Available Resources☆26May 28, 2019Updated 6 years ago
- ☆25May 3, 2024Updated last year
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆40Mar 31, 2025Updated 11 months ago
- The TechQA dataset -- http://ibm.biz/Tech_QA☆26Sep 17, 2025Updated 5 months ago
- Agentic Research and Evaluation Suite☆77Feb 26, 2026Updated last week
- Code and data for "The Power of Noise: Redefining Retrieval for RAG Systems"☆69Jul 3, 2025Updated 8 months ago
- KV Cache Steering for Inducing Reasoning in Small Language Models☆46Jul 24, 2025Updated 7 months ago
- ☆76Jan 24, 2025Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- This repo contains code to reproduce some of the results presented in the paper "SentenceMIM: A Latent Variable Language Model"☆28Jun 22, 2022Updated 3 years ago
- Codes and packages for the paper titled Evaluating Retrieval Quality in Retrieval-Augmented Generation.☆30May 21, 2025Updated 9 months ago