hyx1999/SAM-Decoding

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hyx1999/SAM-Decoding)

hyx1999 / SAM-Decoding

Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton

☆52

Alternatives and similar repositories for SAM-Decoding

Users that are interested in SAM-Decoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Luowaterbi / TokenRecycling
View on GitHub
[ACL2025 Oral🔥]Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
☆29Nov 11, 2025Updated 8 months ago
sail-sg / LongSpec
View on GitHub
[ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆84Jul 14, 2025Updated last year
NonvolatileMemory / flash_tree_attn
View on GitHub
☆20Dec 24, 2024Updated last year
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆401Apr 22, 2025Updated last year
HArmonizedSS / HASS
View on GitHub
Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)
☆56Mar 14, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
thunlp / FR-Spec
View on GitHub
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
☆55Jul 15, 2025Updated last year
smart-lty / ParallelSpeculativeDecoding
View on GitHub
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆169Dec 23, 2025Updated 6 months ago
zju-jiyicheng / SpecVLM
View on GitHub
[EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
☆48Apr 16, 2026Updated 3 months ago
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆210Mar 18, 2026Updated 4 months ago
FasterDecoding / REST
View on GitHub
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆220Mar 5, 2026Updated 4 months ago
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,276Jun 27, 2026Updated 3 weeks ago
KangJialiang / ViSpec
View on GitHub
[NeurIPS 2025] Official Implementation of ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding.
☆63Jan 28, 2026Updated 5 months ago
mit-han-lab / fastrl
View on GitHub
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆174Feb 27, 2026Updated 4 months ago
LINs-lab / DeFT
View on GitHub
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
☆54Jun 17, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
apoorvumang / prompt-lookup-decoding
View on GitHub
Simple speculative decoding technique, integrated in vLLM and transformers
☆611Aug 23, 2024Updated last year
hemingkx / SWIFT
View on GitHub
[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
☆70Feb 21, 2025Updated last year
LiuXiaoxuanPKU / OSD
View on GitHub
☆68Dec 3, 2024Updated last year
snowflakedb / ArcticInference
View on GitHub
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆459Updated this week
sail-sg / LightTrans
View on GitHub
The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"
☆22Apr 22, 2025Updated last year
NonvolatileMemory / GliDe_with_a_CaPE_ICML_24
View on GitHub
official code for GliDe with a CaPE
☆22Aug 13, 2024Updated last year
Zanette-Labs / SpeculativeRejection
View on GitHub
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆56Oct 29, 2024Updated last year
zju-jiyicheng / LVSpec
View on GitHub
[ACL 2026 Main] See the Forest for the Trees: Loosely Speculative Decoding via Visual-Semantic Guidance for Efficient Inference of Video …
☆27Jul 4, 2026Updated 2 weeks ago
hsj576 / GRIFFIN
View on GitHub
Official Implementation of "GRIFFIN: Effective Token Alignment for Faster Speculative Decoding"[NeurIPS 2025]
☆19May 12, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Lyn-Lucy / MSD
View on GitHub
☆37Jul 21, 2025Updated 11 months ago
jianuo-huang / Domino
View on GitHub
Official implementation of “Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding”.
☆120Updated this week
thunlp / Ouroboros
View on GitHub
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆118Mar 20, 2025Updated last year
YangCao28 / nano-SGLang
View on GitHub
Nano SGLang
☆16Jul 21, 2025Updated last year
Xalp / MARS
View on GitHub
Official Implementation of MARS
☆30Apr 21, 2026Updated 2 months ago
LLMkvsys / rethink-kv-compression
View on GitHub
☆24Mar 7, 2025Updated last year
electron-shaders / MineDraft
View on GitHub
☆38Jun 23, 2026Updated 3 weeks ago
wangqinsi1 / CoreInfer
View on GitHub
This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…
☆18Oct 25, 2024Updated last year
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
Odysseusq / VLCache
View on GitHub
Official Repo for paper "VLCache: Computing 2% Vision Tokens and Reusing 98% for Vision-Language Inference"
☆16Mar 28, 2026Updated 3 months ago
EIT-NLP / SkipGPT
View on GitHub
[ICML 2025] Official implementation of the paper "SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling". …
☆21Nov 17, 2025Updated 8 months ago
eth-easl / dirigent
View on GitHub
Dirigent: Lightweight Serverless Orchestration
☆44Aug 26, 2025Updated 10 months ago
Blueyee / Efficient-CoT-LRMs
View on GitHub
Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!
☆71Apr 1, 2025Updated last year
liranringel / ddtree
View on GitHub
☆389Apr 16, 2026Updated 3 months ago
Kamichanw / CoS
View on GitHub
[ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"
☆30Jun 23, 2025Updated last year
Linking-ai / SCOPE
View on GitHub
(ACL2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation
☆36May 28, 2025Updated last year