Not-Diamond / awesome-ai-model-routingLinks

A curated list of awesome approaches to AI model routing

☆161

Alternatives and similar repositories for awesome-ai-model-routing

Users that are interested in awesome-ai-model-routing are comparing it to the libraries listed below

Sorting:

anyscale / llm-router
Tutorial for building LLM router
☆231Updated last year
ServiceNow / TapeAgents
TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle
☆298Updated last week
Not-Diamond / RoRF
Routing on Random Forest (RoRF)
☆214Updated last year
TheAgentCompany / TheAgentCompany
An agent benchmark with tasks in a simulated software company.
☆570Updated 2 weeks ago
xiaowu0162 / LongMemEval
Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025)
☆240Updated 2 weeks ago
zorazrw / agent-workflow-memory
AWM: Agent Workflow Memory
☆335Updated 8 months ago
SWE-bench / SWE-smith
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
☆432Updated this week
SWE-agent / SWE-ReX
Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.
☆339Updated this week
SqueezeAILab / TinyAgent
[EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!
☆453Updated last year
SalesforceAIResearch / xLAM
xLAM: A Family of Large Action Models to Empower AI Agent Systems
☆564Updated 2 months ago
redotvideo / pluto
Synthetic Data for LLM Fine-Tuning
☆120Updated last year
aymeric-roucher / GAIA
Beating the GAIA benchmark with Transformers Agents. 🚀
☆138Updated 8 months ago
multi-agent-systems-failure-taxonomy / MAST
☆285Updated 3 months ago
Liyan06 / MiniCheck
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]
☆186Updated last month
princeton-pli / hal-harness
☆167Updated last week
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆132Updated 5 months ago
haizelabs / verdict
Inference-time scaling for LLMs-as-a-judge.
☆303Updated 3 weeks ago
ServiceNow / AgentLab
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…
☆427Updated last week
mlfoundations / evalchemy
Automatic evals for LLMs
☆547Updated 3 months ago
apple / ToolSandbox
☆217Updated last year
spcl / MRAG
Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"
☆229Updated 3 weeks ago
NousResearch / Open-Reasoning-Tasks
A comprehensive repository of reasoning tasks for LLMs (and beyond)
☆450Updated last year
PrimeIntellect-ai / genesys
☆135Updated 7 months ago
MadryLab / context-cite
Attribute (or cite) statements generated by LLMs back to in-context information.
☆291Updated last year
vaughanlove / PromptBreeder
Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.
☆150Updated last year
LLMSELECTOR / LLMSELECTOR
☆79Updated 3 weeks ago
huggingface / yourbench
🤗 Benchmark Large Language Models Reliably On Your Data
☆406Updated 3 weeks ago
weaviate / gorilla
Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.
☆136Updated 2 months ago
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆145Updated last year
ictnlp / Auto-RAG
This is the official repository for Auto-RAG.
☆226Updated 3 months ago