Mercidaiha / IRT-RouterLinks
[ACL'25] Code for ACL'25 paper "IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory"
☆17Updated 8 months ago
Alternatives and similar repositories for IRT-Router
Users that are interested in IRT-Router are comparing it to the libraries listed below
Sorting:
- ☆54Updated 2 years ago
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆113Updated 3 months ago
- ☆67Updated 6 months ago
- A curated list of early exiting (LLM, CV, NLP, etc)☆67Updated last year
- ☆60Updated 10 months ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆86Updated 8 months ago
- Repo for EmbedLLM: Learning Compact Representations of Large Language Models☆22Updated last month
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]☆56Updated 3 weeks ago
- [ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models☆24Updated 3 months ago
- ☆28Updated 2 weeks ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆121Updated this week
- DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting☆17Updated 7 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆61Updated 11 months ago
- ☆44Updated 5 months ago
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆44Updated 3 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆258Updated last year
- ☆53Updated last year
- Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)☆47Updated 7 months ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆324Updated 6 months ago
- ☆46Updated 11 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆114Updated 2 months ago
- ☆178Updated 5 months ago
- This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…☆17Updated last year
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆206Updated 8 months ago
- This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)☆23Updated last year
- 🔥 How to efficiently and effectively compress the CoTs or directly generate concise CoTs during inference while maintaining the reasonin…☆63Updated 5 months ago
- 📜 Paper list on decoding methods for LLMs and LVLMs☆61Updated 4 months ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆47Updated last year
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆34Updated 8 months ago
- ☆22Updated last year