RouteWorks / RouterArenaLinks
RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.
☆59Updated this week
Alternatives and similar repositories for RouterArena
Users that are interested in RouterArena are comparing it to the libraries listed below
Sorting:
- dInfer: An Efficient Inference Framework for Diffusion Language Models☆389Updated last week
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆217Updated 7 months ago
- Dynamic Context Selection for Efficient Long-Context LLMs☆53Updated 8 months ago
- AI-Driven Research Systems (ADRS)☆117Updated last month
- Easy, Fast, and Scalable Multimodal AI☆92Updated this week
- ☆36Updated 11 months ago
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆153Updated last week
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆112Updated 2 months ago
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆473Updated last month
- ☆49Updated 4 months ago
- ☆63Updated 8 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆59Updated 2 months ago
- ☆69Updated this week
- [NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆183Updated 3 weeks ago
- ☆47Updated 8 months ago
- ☆90Updated 7 months ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆88Updated 10 months ago
- [DAI 2025] Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing☆198Updated last month
- Data Synthesis for Deep Research Based on Semi-Structured Data☆191Updated last month
- Kinetics: Rethinking Test-Time Scaling Laws☆85Updated 6 months ago
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆485Updated 2 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆207Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆67Updated last year
- ☆60Updated 6 months ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆101Updated 3 months ago
- ☆73Updated 6 months ago
- ☆34Updated 4 months ago
- ☆110Updated 4 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆39Updated 11 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆519Updated 11 months ago