shreyansh26/Speculative-Sampling

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shreyansh26/Speculative-Sampling)

shreyansh26 / Speculative-Sampling

Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind

☆111

Alternatives and similar repositories for Speculative-Sampling

Users that are interested in Speculative-Sampling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

feifeibear / LLMSpeculativeSampling
View on GitHub
Fast inference from large lauguage models via speculative decoding
☆922Aug 22, 2024Updated last year
lucidrains / speculative-decoding
View on GitHub
Explorations into some recent techniques surrounding speculative decoding
☆307Dec 22, 2024Updated last year
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,758Jun 25, 2024Updated 2 years ago
d-matrix-ai / keyformer-llm
View on GitHub
Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning
☆57Mar 26, 2024Updated 2 years ago
FasterDecoding / REST
View on GitHub
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆220Mar 5, 2026Updated 4 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
KaiLv69 / DuoDecoding
View on GitHub
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
☆19Mar 4, 2025Updated last year
dilab-zju / self-speculative-decoding
View on GitHub
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆230Feb 13, 2025Updated last year
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆401Apr 22, 2025Updated last year
NoakLiu / Awesome-Distributed-RL
View on GitHub
A Collection for Distributed Reinforcement Learning Papers
☆18Sep 24, 2025Updated 10 months ago
weishengying / cute_gemm
View on GitHub
☆23Aug 14, 2024Updated last year
samchaineau / llm_slerp_generation
View on GitHub
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆37Oct 9, 2025Updated 9 months ago
zzbright1998 / SentenceKV
View on GitHub
Official implementation of "SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching" (COLM 2025). A novel KV cache com…
☆15Sep 29, 2025Updated 9 months ago
VITA-Group / Q-Hitter
View on GitHub
☆15Jun 4, 2024Updated 2 years ago
CODEJIN / MLPSinger
View on GitHub
☆24Mar 15, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
smallbenchnlp / ELECTRA-DeBERTa
View on GitHub
☆16Dec 14, 2022Updated 3 years ago
NL2Code / CodeM
View on GitHub
☆44Jun 2, 2024Updated 2 years ago
kztakemoto / simbaja
View on GitHub
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
☆17Apr 24, 2024Updated 2 years ago
hao-ai-lab / Consistency_LLM
View on GitHub
[ICML 2024] CLLMs: Consistency Large Language Models
☆416Nov 16, 2024Updated last year
thu-spmi / ST-NAS
View on GitHub
Efficient Neural Architecture Search via Straight-Through Gradients
☆13Nov 12, 2020Updated 5 years ago
ConstantPark / Parallel_Development_Community_GPGPU_Study
View on GitHub
☆14Dec 16, 2020Updated 5 years ago
WooSunghyeon / dropbp
View on GitHub
The official code for Dropping Backward Propagation (DropBP)
☆32Oct 29, 2024Updated last year
Yaoming95 / CIAT
View on GitHub
code repo for EMNLP'21 Finding Counter-Interference Adapter for Multilingual Machine Translation
☆18Oct 19, 2022Updated 3 years ago
liuhuang31 / g2pw_once
View on GitHub
G2pw's inference speed is accelerated by about 8-10 times. Change loop generated predictive data to only once and model loop prediction b…
☆14Dec 30, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
samsad35 / source-filter-vae
View on GitHub
[SpeechCom Journal] Learning and controlling the source-filter representation of speech with a variational autoencoder
☆46Apr 18, 2023Updated 3 years ago
redmist328 / APNet2
View on GitHub
Source code of APNet2, a vocoder
☆60Nov 23, 2023Updated 2 years ago
efeslab / Atom
View on GitHub
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆344Jul 2, 2024Updated 2 years ago
reppy4620 / vocoders
View on GitHub
My vocoder experiments
☆31Jul 26, 2025Updated last year
opengear-project / GEAR
View on GitHub
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆183Jul 12, 2024Updated 2 years ago
astramind-ai / Mixture-of-depths
View on GitHub
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆175Jun 20, 2024Updated 2 years ago
MichaelZhouwang / Sequence_Span_Rewriting
View on GitHub
Code for EMNLP 2021 paper: Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
☆17Nov 30, 2021Updated 4 years ago
samsad35 / code-ancogen
View on GitHub
[ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
☆14Mar 11, 2025Updated last year
goliaro / specinfer-ae
View on GitHub
☆28Mar 14, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
haiciyang / LaDiffCodec
View on GitHub
ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.
☆56Nov 16, 2025Updated 8 months ago
Equationliu / Kangaroo
View on GitHub
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆72Jun 26, 2024Updated 2 years ago
hanshounsu / d3rm
View on GitHub
☆14Feb 3, 2026Updated 5 months ago
hemingkx / SpecDec
View on GitHub
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
☆47Dec 9, 2023Updated 2 years ago
romsto / Speculative-Decoding
View on GitHub
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
☆112Dec 2, 2024Updated last year
shreyansh26 / Attention-Mask-Patterns
View on GitHub
Using FlexAttention to compute attention with different masking patterns
☆47Sep 22, 2024Updated last year
RickySkywalker / Synthesis_Step-by-Step_Official
View on GitHub
☆24Feb 5, 2024Updated 2 years ago