Efficient LLM query routing via multi-sampling. BEST-Route selects both model and number of responses based on query difficulty, cutting costs by up to 60% with <1% performance drop. From the paper//arxiv.org/abs/2506.22716
☆55Apr 8, 2026Updated last month
Alternatives and similar repositories for best-route-llm
Users that are interested in best-route-llm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆34Feb 17, 2026Updated 3 months ago
- [ACL'25] Code for ACL'25 paper "IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory"☆32Feb 19, 2025Updated last year
- ☆23Feb 28, 2025Updated last year
- ☆91Oct 17, 2025Updated 7 months ago
- [NeurIPS'25] Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning☆132Dec 30, 2025Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 6 months ago
- ☆12Apr 23, 2026Updated last month
- ☆15Apr 13, 2024Updated 2 years ago
- ☆16Jan 14, 2025Updated last year
- A platform that provides users with easy access to AI services developed by Montimage and usage of explainable AI techniques (e.g., LIME,…☆10Feb 17, 2026Updated 3 months ago
- Codes for Merging Large Language Models☆36Aug 7, 2024Updated last year
- MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation☆17Sep 2, 2024Updated last year
- Survey on LLM Inference via Search (TMLR 2025)☆14May 6, 2025Updated last year
- Microsoft's open source max-min fair solver for cluster scheduling and traffic engineering☆19Apr 13, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- SplitBud is a Split Learning framework built upon Flower☆14Mar 22, 2025Updated last year
- ☆12Mar 13, 2023Updated 3 years ago
- Packet-level simulation code to model Opera and other networks from the 2020 NSDI paper "Expanding across time to deliver bandwidth effic…☆15Jun 10, 2020Updated 5 years ago
- ☆16Apr 30, 2026Updated 3 weeks ago
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆19Dec 8, 2023Updated 2 years ago
- ☆16Feb 10, 2023Updated 3 years ago
- A Better Way to Attend: Attention with Trees for Video Question Answering☆25Mar 25, 2019Updated 7 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated 2 years ago
- The official repository of ICCV 2025 paper "CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning".☆19Nov 26, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- Monitoring the health of ARR☆31Apr 4, 2026Updated last month
- ☆14Dec 20, 2024Updated last year
- Burstable Cloud Scheduler☆17Jun 6, 2024Updated last year
- 📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉☆16Mar 30, 2025Updated last year
- LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure☆288May 19, 2026Updated last week
- A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.☆12Oct 12, 2018Updated 7 years ago
- ☆15Aug 12, 2023Updated 2 years ago
- Mitigating Routing Update Overhead for Traffic Engineering by Combining Destination-based Routing with Reinforcement Learning☆15Oct 16, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Dec 1, 2022Updated 3 years ago
- ☆16Apr 1, 2023Updated 3 years ago
- Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling☆13Mar 7, 2024Updated 2 years ago
- [TBD] "m4: A Learned Flow-level Network Simulator" by Chenning Li, Anton A. Zabreyko, Om Chabra, Arash Nasr-Esfahany, Kevin Zhao, Pratees…☆20Apr 27, 2026Updated last month
- ☆25Jul 30, 2025Updated 9 months ago
- Predict the performance of LLM inference services☆23Sep 18, 2025Updated 8 months ago
- Releasing the spot availability traces used in "Can't Be Late" paper.☆26Mar 31, 2024Updated 2 years ago