RouteWorks / RouterArenaLinks
RouterArena: An open framework for evaluating LLM routers with standardized datasets, metrics, an automated framework, and a live leaderboard.
☆61Updated this week
Alternatives and similar repositories for RouterArena
Users that are interested in RouterArena are comparing it to the libraries listed below
Sorting:
- dInfer: An Efficient Inference Framework for Diffusion Language Models☆410Updated last month
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆218Updated 8 months ago
- [NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆196Updated 2 weeks ago
- ☆36Updated 11 months ago
- ☆110Updated 4 months ago
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆155Updated 3 weeks ago
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆481Updated 2 months ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆104Updated 4 months ago
- ☆50Updated 4 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆116Updated 2 months ago
- AI-Driven Research Systems (ADRS)☆117Updated last month
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆63Updated 3 months ago
- Dynamic Context Selection for Efficient Long-Context LLMs☆54Updated 8 months ago
- ☆64Updated 8 months ago
- ☆77Updated last week
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34Updated 8 months ago
- Easy, Fast, and Scalable Multimodal AI☆109Updated this week
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆70Updated 7 months ago
- [NeurIPS'25] Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning☆115Updated last month
- [DAI 2025] Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing☆198Updated last month
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆248Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆141Updated last year
- [ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.☆55Updated 9 months ago
- Block Diffusion for Ultra-Fast Speculative Decoding☆459Updated this week
- Memory optimized Mixture of Experts☆73Updated 6 months ago
- KV cache compression for high-throughput LLM inference☆151Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆69Updated last year
- Data Synthesis for Deep Research Based on Semi-Structured Data☆197Updated last month
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆160Updated 3 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆289Updated 3 months ago