LiuXiaoxuanPKU / OSDView external linksLinks
☆64Dec 3, 2024Updated last year
Alternatives and similar repositories for OSD
Users that are interested in OSD are comparing it to the libraries listed below
Sorting:
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- ☆28May 24, 2025Updated 8 months ago
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton☆40Feb 13, 2025Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆142Dec 4, 2024Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆70Nov 4, 2024Updated last year
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆50Jul 15, 2025Updated 7 months ago
- Multi-Candidate Speculative Decoding☆39Apr 22, 2024Updated last year
- official code for GliDe with a CaPE☆20Aug 13, 2024Updated last year
- ☆34Jun 22, 2024Updated last year
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆147Dec 23, 2025Updated last month
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,121Jan 24, 2026Updated 3 weeks ago
- Implementation of AdaCQR(COLING 2025)☆13Dec 30, 2024Updated last year
- ☆26Aug 31, 2023Updated 2 years ago
- Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an …☆46Jun 1, 2024Updated last year
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆209Sep 21, 2024Updated last year
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆367Apr 22, 2025Updated 9 months ago
- Fast inference from large lauguage models via speculative decoding☆886Aug 22, 2024Updated last year
- 📜 Paper list on decoding methods for LLMs and LVLMs☆68Nov 7, 2025Updated 3 months ago
- ☆17Jan 27, 2025Updated last year
- ☆12Oct 16, 2022Updated 3 years ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆356Nov 20, 2025Updated 2 months ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆123Jul 4, 2025Updated 7 months ago
- The official repo of continuous speculative decoding☆31Mar 28, 2025Updated 10 months ago
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration☆61Feb 21, 2025Updated 11 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆220May 31, 2025Updated 8 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆160Oct 13, 2025Updated 4 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆135Feb 22, 2024Updated last year
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆58Mar 26, 2024Updated last year
- ☆16Jan 21, 2023Updated 3 years ago
- Artifacts for our SIGCOMM'23 paper Ditto☆15Oct 17, 2023Updated 2 years ago
- Auto Build Deepspeed☆19Oct 10, 2025Updated 4 months ago
- ☆17May 10, 2024Updated last year
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆35Jan 9, 2023Updated 3 years ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆37Sep 24, 2024Updated last year
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆65Jun 26, 2024Updated last year
- Reading seminar in Harvard Cloud Networking and Systems Group☆16Aug 29, 2022Updated 3 years ago
- Vortex: A Flexible and Efficient Sparse Attention Framework☆46Jan 21, 2026Updated 3 weeks ago
- Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation☆35Feb 27, 2024Updated last year
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆276Aug 31, 2024Updated last year