awslabs / rag-qa-arenaView external linksLinks
☆51Aug 14, 2024Updated last year
Alternatives and similar repositories for rag-qa-arena
Users that are interested in rag-qa-arena are comparing it to the libraries listed below
Sorting:
- ☆20Mar 22, 2024Updated last year
- ☆15Feb 21, 2024Updated last year
- [KDD24-ADS] R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models☆11Apr 9, 2024Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 7 months ago
- ☆26Nov 7, 2022Updated 3 years ago
- ☆17May 21, 2025Updated 8 months ago
- Data and code for paper "ODSum: New Benchmarks for Open Domain Multi-Document Summarization"☆11Sep 20, 2024Updated last year
- Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…☆28Nov 2, 2023Updated 2 years ago
- ☆32Feb 1, 2026Updated 2 weeks ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- ☆33Jul 9, 2025Updated 7 months ago
- [ICLR 2025] Large (Vision) Language Models are Unsupervised In-Context Learners☆22Jun 6, 2025Updated 8 months ago
- ☆21Jun 12, 2024Updated last year
- A holistic benchmark for LLM abstention☆69Aug 27, 2025Updated 5 months ago
- ☆43Jan 21, 2025Updated last year
- Visualize constituent and dependency parses as PDF or image formats, through GraphViz.☆32Feb 11, 2021Updated 5 years ago
- ☆16Sep 9, 2023Updated 2 years ago
- ☆31Sep 12, 2025Updated 5 months ago
- Reproducible Language Agent Research☆33Jun 25, 2025Updated 7 months ago
- Agentic Research and Evaluation Suite☆71Updated this week
- Comprehensive benchmark for RAG☆260Jun 14, 2025Updated 8 months ago
- ☆27Mar 21, 2024Updated last year
- This is the official repo for the paper "AMO-Bench: Large Language Models Still Struggle in High School Math Competitions".☆62Feb 6, 2026Updated last week
- Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021☆29Feb 1, 2023Updated 3 years ago
- YiSi: A Semantic Machine Translation Evaluation Metric for Evaluating Languages with Different Levels of Available Resources☆26May 28, 2019Updated 6 years ago
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆28Oct 28, 2024Updated last year
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 8, 2026Updated last week
- State-of-the-art architecture for Plant Disease Detection using Deep Learning.☆10Jul 4, 2022Updated 3 years ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆13Jun 28, 2025Updated 7 months ago
- ☆27May 23, 2024Updated last year
- ☆25May 3, 2024Updated last year
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆40Mar 31, 2025Updated 10 months ago
- Code and data for "The Power of Noise: Redefining Retrieval for RAG Systems"☆69Jul 3, 2025Updated 7 months ago
- KV Cache Steering for Inducing Reasoning in Small Language Models☆46Jul 24, 2025Updated 6 months ago
- ☆76Jan 24, 2025Updated last year
- This repo contains code to reproduce some of the results presented in the paper "SentenceMIM: A Latent Variable Language Model"☆28Jun 22, 2022Updated 3 years ago
- ☆18Jun 10, 2025Updated 8 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- Codes and packages for the paper titled Evaluating Retrieval Quality in Retrieval-Augmented Generation.☆30May 21, 2025Updated 8 months ago