☆64Dec 3, 2024Updated last year
Alternatives and similar repositories for OSD
Users that are interested in OSD are comparing it to the libraries listed below
Sorting:
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- ☆28May 24, 2025Updated 9 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆42Feb 13, 2025Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆143Dec 4, 2024Updated last year
- ☆66Nov 4, 2024Updated last year
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆36Aug 29, 2025Updated 6 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆72Nov 4, 2024Updated last year
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆52Jul 15, 2025Updated 7 months ago
- Multi-Candidate Speculative Decoding☆39Apr 22, 2024Updated last year
- official code for GliDe with a CaPE☆20Aug 13, 2024Updated last year
- ☆35Jun 22, 2024Updated last year
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆147Dec 23, 2025Updated 2 months ago
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,131Jan 24, 2026Updated last month
- Implementation of AdaCQR(COLING 2025)☆13Dec 30, 2024Updated last year
- ☆26Aug 31, 2023Updated 2 years ago
- Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …☆47Jun 1, 2024Updated last year
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆210Sep 21, 2024Updated last year
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆369Apr 22, 2025Updated 10 months ago
- Fast inference from large lauguage models via speculative decoding