Mercidaiha / IRT-RouterLinks

[ACL'25] Code for ACL'25 paper "IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory"

☆18

Alternatives and similar repositories for IRT-Router

Users that are interested in IRT-Router are comparing it to the libraries listed below

Sorting:

LiuXiaoxuanPKU / OSD
☆60Updated 11 months ago
pettingllms-ai / PettingLLMs
A RL Framework for multi LLM agent system
☆63Updated last month
ruipeterpan / specreason
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆58Updated last month
uservan / speculative_thinking
☆29Updated last month
hdong920 / GRIFFIN
☆39Updated last year
pan-x-c / EE-LLM
EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).
☆70Updated last year
thunlp / FR-Spec
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
☆47Updated 4 months ago
henryzhongsc / longctx_bench
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…
☆86Updated 8 months ago
Jingyu6 / speculative_prefill
☆46Updated 6 months ago
zyxxmu / cam
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆47Updated last year
ZO-Bench / ZO-LLM
[ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".
☆117Updated 4 months ago
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆206Updated 5 months ago
hyx1999 / SAM-Decoding
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
☆36Updated 9 months ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆130Updated 3 weeks ago
Infini-AI-Lab / APE
☆34Updated 9 months ago
Persdre / NeurIPS-2024-LLM-Papers
Accepted LLM Papers in NeurIPS 2024
☆37Updated last year
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆64Updated last year
shadowpa0327 / Palu
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
☆146Updated 9 months ago
BaohaoLiao / RSD
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
☆51Updated 6 months ago
falcon-xu / early-exit-papers
A curated list of early exiting (LLM, CV, NLP, etc)
☆68Updated last year
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆114Updated 3 months ago
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆93Updated 2 years ago
RLsys-Foundation / APRIL
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…
☆43Updated last month
dilab-zju / self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆209Updated 9 months ago
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆275Updated 2 weeks ago
RLsys-Foundation / TritonForge
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆99Updated last week
Jikai0Wang / Speculative_CoT
☆19Updated 6 months ago
Guangxuan-Xiao / GSM8K-eval
☆54Updated 2 years ago
machilusZ / FastGen
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
☆41Updated last year
Equationliu / Kangaroo
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆63Updated last year