microsoft / benchmark-qedLinks
Automated benchmarking of Retrieval-Augmented Generation (RAG) systems
☆50Updated last month
Alternatives and similar repositories for benchmark-qed
Users that are interested in benchmark-qed are comparing it to the libraries listed below
Sorting:
- Ranking LLMs on agentic tasks☆194Updated last month
- Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning☆410Updated 3 weeks ago
- Open source RAG evaluation package☆312Updated last week
- ☆146Updated last year
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆120Updated 8 months ago
- ☆232Updated 3 months ago
- A method for steering llms to better follow instructions☆54Updated 2 months ago
- Enterprise-grade memory framework for LLMs featuring GPU-optimized inference, vector storage, and automated scaling. Enables hyper-person…☆88Updated 5 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆168Updated last week
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆135Updated last month
- The Granite Guardian models are designed to detect risks in prompts and responses.☆119Updated last week
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆228Updated 2 weeks ago
- ☆271Updated 7 months ago
- An open-source tool for LLM prompt optimization.☆657Updated 2 weeks ago
- Tutorial for building LLM router☆230Updated last year
- A curated list of awesome approaches to AI model routing☆160Updated 6 months ago
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆241Updated last year
- UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities☆124Updated 4 months ago
- Official Repo for CRMArena and CRMArena-Pro☆119Updated 3 months ago
- This repository contains the toolkit for replicating results from our technical report.☆148Updated last month
- ☆79Updated 2 weeks ago
- ☆264Updated 3 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆102Updated 6 months ago
- Rank LLMs, RAG systems, and prompts using automated head-to-head evaluation☆105Updated 10 months ago
- ☆78Updated 9 months ago
- Catch MCP server issues before your agents do.☆121Updated this week
- Official code of the ACL 2025 paper "SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation"☆125Updated 2 months ago
- A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)☆731Updated 3 months ago
- ☆95Updated 6 months ago