Efficient LLM query routing via multi-sampling. BEST-Route selects both model and number of responses based on query difficulty, cutting costs by up to 60% with <1% performance drop. From the paper//arxiv.org/abs/2506.22716
☆53Apr 8, 2026Updated 3 weeks ago
Alternatives and similar repositories for best-route-llm
Users that are interested in best-route-llm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Compression for Foundation Models☆35Jul 21, 2025Updated 9 months ago
- Prototypes and experiments for WG Device Management.☆15Apr 1, 2026Updated last month
- 📖The Big-&-Extending-Repository-of-Transformers: Pretrained PyTorch models for Google's BERT, OpenAI GPT & GPT-2, Google/CMU Transformer…☆11May 30, 2019Updated 6 years ago
- ☆89Oct 17, 2025Updated 6 months ago
- [NeurIPS'25] Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning☆129Dec 30, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The code of RouterDC☆71Apr 14, 2025Updated last year
- A class for synchronizing sensor readings to the system clock☆11Oct 25, 2018Updated 7 years ago
- YEY Blog ->☆12Jun 26, 2025Updated 10 months ago
- Framework for Cost-Effective Language Model Choice☆16Dec 12, 2023Updated 2 years ago
- Code for SIGKDD2025 paper: An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem☆14Jan 28, 2025Updated last year
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 6 months ago
- ☆13Jan 14, 2020Updated 6 years ago
- Prompt-to-Leaderboard☆277May 9, 2025Updated 11 months ago
- ☆16Jan 14, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A platform that provides users with easy access to AI services developed by Montimage and usage of explainable AI techniques (e.g., LIME,…☆10Feb 17, 2026Updated 2 months ago
- An evaluation framework for data center traffic engineering.☆14Jul 28, 2024Updated last year
- Codes for Merging Large Language Models☆36Aug 7, 2024Updated last year
- MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation☆17Sep 2, 2024Updated last year
- Code for the paper "Age of Information Analysis in Edge Computing Servers"☆22Feb 12, 2024Updated 2 years ago
- Survey on LLM Inference via Search (TMLR 2025)☆14May 6, 2025Updated last year
- A Transformer-based model for read-level DNA methylation pattern identification and tumour deconvolution☆44Mar 12, 2025Updated last year
- Microsoft's open source max-min fair solver for cluster scheduling and traffic engineering☆19Apr 13, 2026Updated 3 weeks ago
- ☆12Mar 13, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Packet-level simulation code to model Opera and other networks from the 2020 NSDI paper "Expanding across time to deliver bandwidth effic…☆15Jun 10, 2020Updated 5 years ago
- Implementation of self-certainty as an extention of ZeroEval Project☆36May 31, 2025Updated 11 months ago
- Introducing: Large Scale Capacity Consensus!☆14Nov 1, 2021Updated 4 years ago
- ☆16Apr 30, 2026Updated last week
- ☆16Feb 10, 2023Updated 3 years ago
- A Better Way to Attend: Attention with Trees for Video Question Answering☆25Mar 25, 2019Updated 7 years ago
- INFOCOM 2024: Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference☆34Oct 13, 2024Updated last year
- The official repository of ICCV 2025 paper "CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning".☆18Nov 26, 2025Updated 5 months ago
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering☆25Nov 4, 2020Updated 5 years ago
- Burstable Cloud Scheduler☆17Jun 6, 2024Updated last year
- LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure☆255Apr 30, 2026Updated last week
- ☆50Nov 9, 2025Updated 5 months ago
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆31Jun 1, 2024Updated last year
- [DAI 2025] Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing☆205Dec 11, 2025Updated 4 months ago
- Mitigating Routing Update Overhead for Traffic Engineering by Combining Destination-based Routing with Reinforcement Learning☆15Oct 16, 2022Updated 3 years ago