smart-lty/ParallelSpeculativeDecoding

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/smart-lty/ParallelSpeculativeDecoding)

smart-lty / ParallelSpeculativeDecoding

[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length

☆170

Alternatives and similar repositories for ParallelSpeculativeDecoding

Users that are interested in ParallelSpeculativeDecoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆210Mar 18, 2026Updated 4 months ago
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,277Jun 27, 2026Updated 3 weeks ago
smart-lty / REST
View on GitHub
The code of paper "Learning Rule-Induced Subgraph Representations for Inductive Relation Prediction" in NeurIPS 2023.
☆14Nov 25, 2023Updated 2 years ago
sail-sg / LongSpec
View on GitHub
[ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆84Jul 14, 2025Updated last year
KaiLv69 / DuoDecoding
View on GitHub
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
☆19Mar 4, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
jianuo-huang / Domino
View on GitHub
Official implementation of “Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding”.
☆121Updated this week
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆401Apr 22, 2025Updated last year
SafeAILab / EAGLE
View on GitHub
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
☆2,473Feb 20, 2026Updated 5 months ago
sgl-project / SpecForge
View on GitHub
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆1,000Updated this week
hyx1999 / SAM-Decoding
View on GitHub
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
☆52May 12, 2026Updated 2 months ago
thunlp / Ouroboros
View on GitHub
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆117Mar 20, 2025Updated last year
Infini-AI-Lab / TriForce
View on GitHub
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
☆281Aug 31, 2024Updated last year
hemingkx / SWIFT
View on GitHub
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
☆70Feb 21, 2025Updated last year
feifeibear / LLMSpeculativeSampling
View on GitHub
Fast inference from large lauguage models via speculative decoding
☆920Aug 22, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ruipeterpan / specreason
View on GitHub
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆74Oct 2, 2025Updated 9 months ago
electron-shaders / MineDraft
View on GitHub
☆38Jun 23, 2026Updated 3 weeks ago
Kaffaljidhmah2 / SpecDec_pp
View on GitHub
Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
☆19Jul 10, 2025Updated last year
HArmonizedSS / HASS
View on GitHub
Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)
☆56Mar 14, 2025Updated last year
mit-han-lab / fastrl
View on GitHub
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆174Feb 27, 2026Updated 4 months ago
SNU-ARC / any-precision-llm
View on GitHub
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆130Jul 4, 2025Updated last year
mit-han-lab / duo-attention
View on GitHub
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆539Feb 10, 2025Updated last year
microsoft / MInference
View on GitHub
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…
☆1,221Apr 8, 2026Updated 3 months ago
mit-han-lab / omniserve
View on GitHub
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆850Mar 6, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
thunlp / FR-Spec
View on GitHub
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
☆55Jul 15, 2025Updated last year
AMD-AGI / PARD
View on GitHub
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation (ICLR 26)
☆33Jun 10, 2026Updated last month
FasterDecoding / REST
View on GitHub
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆220Mar 5, 2026Updated 4 months ago
NonvolatileMemory / flash_tree_attn
View on GitHub
☆20Dec 24, 2024Updated last year
Equationliu / Kangaroo
View on GitHub
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆72Jun 26, 2024Updated 2 years ago
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆57Aug 6, 2025Updated 11 months ago
madsys-dev / deepseekv2-profile
View on GitHub
☆156Mar 4, 2025Updated last year
October2001 / Awesome-KV-Cache-Compression
View on GitHub
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆726Apr 15, 2026Updated 3 months ago
ByteDance-Seed / ShadowKV
View on GitHub
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
☆310May 1, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Lyn-Lucy / MSD
View on GitHub
☆37Jul 21, 2025Updated last year
MIRALab-USTC / DD-RetroDCVAE
View on GitHub
A novel template-free retrosynthesizer that can generate diverse sets of reactants for a desired product via discrete conditional variati…
☆15Aug 7, 2022Updated 3 years ago
Infini-AI-Lab / MagicDec
View on GitHub
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆154Dec 4, 2024Updated last year
hao-ai-lab / LookaheadDecoding
View on GitHub
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,340Mar 6, 2025Updated last year
snowflakedb / ArcticInference
View on GitHub
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆460Jul 14, 2026Updated last week
jy-yuan / KIVI
View on GitHub
[ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
☆419Nov 20, 2025Updated 8 months ago
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆512Jan 8, 2026Updated 6 months ago