Explorations into some recent techniques surrounding speculative decoding
β300Dec 22, 2024Updated last year
Alternatives and similar repositories for speculative-decoding
Users that are interested in speculative-decoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast inference from large lauguage models via speculative decodingβ904Aug 22, 2024Updated last year
- π° Must-read papers and blogs on Speculative Decoding β‘οΈβ1,163Mar 9, 2026Updated 2 weeks ago
- Multi-Candidate Speculative Decodingβ40Apr 22, 2024Updated last year
- Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.β102Dec 2, 2024Updated last year
- [NeurIPS'23] Speculative Decoding with Big Little Decoderβ96Feb 6, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**β221Feb 13, 2025Updated last year
- REST: Retrieval-Based Speculative Decoding, NAACL 2024β215Mar 5, 2026Updated 3 weeks ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.β99Aug 20, 2023Updated 2 years ago
- Cascade Speculative Draftingβ33Apr 2, 2024Updated last year
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)β376Apr 22, 2025Updated 11 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)β65Sep 28, 2024Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,324Mar 6, 2025Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).β2,229Feb 20, 2026Updated last month
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmindβ110Feb 29, 2024Updated 2 years ago
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- β15Aug 19, 2024Updated last year
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Headsβ2,722Jun 25, 2024Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findingsβ46May 23, 2023Updated 2 years ago
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verificationβ76Jul 14, 2025Updated 8 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β362Feb 5, 2026Updated last month
- β599Aug 23, 2024Updated last year
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amountβ¦β53Oct 22, 2023Updated 2 years ago
- β28May 24, 2025Updated 10 months ago
- β354Apr 2, 2024Updated last year
- NordVPN Special Discount Offer β’ AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Trainingβ1,864Updated this week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsityβ237Sep 24, 2023Updated 2 years ago
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decodingβ278Aug 31, 2024Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ145Dec 4, 2024Updated last year
- β26Mar 14, 2024Updated 2 years ago
- This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).β11May 24, 2024Updated last year
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitinβ¦β68Jun 26, 2024Updated last year
- Implementation of a holodeck, written in Pytorchβ18Nov 1, 2023Updated 2 years ago
- Crawl & visualize ICLR papers and reviews.β18Nov 5, 2022Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- The official repo of continuous speculative decodingβ32Mar 28, 2025Updated 11 months ago
- FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]β48Feb 17, 2026Updated last month
- Serving multiple LoRA finetuned LLM as oneβ1,148May 8, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weightsβ19Oct 9, 2022Updated 3 years ago
- DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Draftingβ17Mar 4, 2025Updated last year
- β52Feb 19, 2024Updated 2 years ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)β117Mar 20, 2025Updated last year