π° Must-read papers and blogs on Speculative Decoding β‘οΈ
β1,180Mar 31, 2026Updated last week
Alternatives and similar repositories for SpeculativeDecodingPapers
Users that are interested in SpeculativeDecodingPapers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)β381Apr 22, 2025Updated 11 months ago
- Fast inference from large lauguage models via speculative decodingβ911Aug 22, 2024Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).β2,253Feb 20, 2026Updated last month
- [ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Accelerationβ65Feb 21, 2025Updated last year
- Explorations into some recent techniques surrounding speculative decodingβ300Dec 22, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**β223Feb 13, 2025Updated last year
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Lengthβ155Dec 23, 2025Updated 3 months ago
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Headsβ2,720Jun 25, 2024Updated last year
- REST: Retrieval-Based Speculative Decoding, NAACL 2024β216Mar 5, 2026Updated last month
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decodingβ279Aug 31, 2024Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,327Mar 6, 2025Updated last year
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)β46Dec 9, 2023Updated 2 years ago
- Multi-Candidate Speculative Decodingβ40Apr 22, 2024Updated last year
- Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)β56Mar 14, 2025Updated last year
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- πA curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.πβ5,130Updated this week
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)β116Mar 20, 2025Updated last year
- A curated list for Efficient Large Language Modelsβ1,977Jun 17, 2025Updated 9 months ago
- π° Must-read papers on KV Cache Compression (constantly updating π€).β679Feb 24, 2026Updated last month
- scalable and robust tree-based speculative decoding algorithmβ376Jan 28, 2025Updated last year
- Awesome LLM compression research papers and tools.β1,796Feb 23, 2026Updated last month
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitinβ¦β68Jun 26, 2024Updated last year
- [ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verificationβ79Jul 14, 2025Updated 8 months ago
- Paper list for Efficient Reasoning.β863Apr 4, 2026Updated last week
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- β28May 24, 2025Updated 10 months ago
- β64Dec 3, 2024Updated last year
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Trainingβ1,870Mar 25, 2026Updated 2 weeks ago
- FlashInfer: Kernel Library for LLM Servingβ5,273Apr 4, 2026Updated last week
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ535Feb 10, 2025Updated last year
- A throughput-oriented high-performance serving framework for LLMsβ953Mar 29, 2026Updated last week
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Seβ¦β822Mar 6, 2025Updated last year
- Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automatonβ47Feb 13, 2025Updated last year
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cacheβ381Nov 20, 2025Updated 4 months ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ145Dec 4, 2024Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttentionβ470May 30, 2025Updated 10 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ380Jul 10, 2025Updated 9 months ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmindβ110Feb 29, 2024Updated 2 years ago
- My learning notes for ML SYS.β5,863Apr 3, 2026Updated last week
- β602Aug 23, 2024Updated last year
- Codes for our paper "Enhancing Continual Relation Extraction via Classifier Decomposition" (Findings of ACL2023)β10Nov 29, 2023Updated 2 years ago