romsto/Speculative-Decoding

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/romsto/Speculative-Decoding)

romsto / Speculative-Decoding

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

☆111

Alternatives and similar repositories for Speculative-Decoding

Users that are interested in Speculative-Decoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

feifeibear / LLMSpeculativeSampling
View on GitHub
Fast inference from large lauguage models via speculative decoding
☆918Aug 22, 2024Updated last year
mscheong01 / speculative_decoding.c
View on GitHub
minimal C implementation of speculative decoding based on llama2.c
☆30Jul 15, 2024Updated last year
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,263Jun 27, 2026Updated last week
sangminwoo / RITUAL
View on GitHub
Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…
☆14Dec 16, 2024Updated last year
KaiLv69 / DuoDecoding
View on GitHub
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
☆19Mar 4, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
jaymody / speculative-sampling
View on GitHub
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆99Aug 20, 2023Updated 2 years ago
javirandor / wdr
View on GitHub
☆10May 18, 2022Updated 4 years ago
sangminwoo / AvisC
View on GitHub
[ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vis…
☆25Jul 21, 2024Updated last year
shreyansh26 / Speculative-Sampling
View on GitHub
Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind
☆111Feb 29, 2024Updated 2 years ago
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆399Apr 22, 2025Updated last year
CLab-HKUST-GZ / micro58-axcore
View on GitHub
☆41Oct 21, 2025Updated 8 months ago
dilab-zju / self-speculative-decoding
View on GitHub
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆229Feb 13, 2025Updated last year
shunzh / mcts-for-llm
View on GitHub
This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.
☆16Jun 28, 2024Updated 2 years ago
meetdavidwan / crg
View on GitHub
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆39Mar 4, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ece-fast-lab / ASPLOS-2025-M5
View on GitHub
This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …
☆17Apr 1, 2025Updated last year
mbzuai-nlp / AudioJailbreak
View on GitHub
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
☆32Oct 6, 2025Updated 9 months ago
Visual-AI / PruneVid
View on GitHub
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆71May 15, 2025Updated last year
flexflow / flexflow-serve
View on GitHub
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
☆86Sep 15, 2025Updated 9 months ago
UNITES-Lab / Occult
View on GitHub
[ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…
☆13Apr 17, 2025Updated last year
EfficientLLMSys / MuxServe
View on GitHub
☆15Jun 26, 2024Updated 2 years ago
QingruZhang / PLATON
View on GitHub
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Oct 17, 2022Updated 3 years ago
Sys-KU / DSA-Linux
View on GitHub
[IEEE CAL 2025] Accelerating Page Migrations in Operating Systems with Intel DSA
☆16Nov 20, 2024Updated last year
ScienceNLP-Lab / LLM-SSC
View on GitHub
Rhetorical sentence classification using LLMs
☆11Oct 26, 2025Updated 8 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
SiddhantBikram / MemeCLIP
View on GitHub
Official Repository for the paper 'MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification' @ EMNLP 2024
☆25May 1, 2025Updated last year
uservan / speculative_thinking
View on GitHub
☆34Oct 13, 2025Updated 8 months ago
Jikai0Wang / Speculative_CoT
View on GitHub
☆20May 14, 2025Updated last year
Jikai0Wang / OPT-Tree
View on GitHub
☆30May 24, 2025Updated last year
SuDIS-ZJU / llm-inference-all-in-one
View on GitHub
☆19Feb 18, 2025Updated last year
Helloworld10011 / Adversarial-Reasoning
View on GitHub
A new algorithm that formulates jailbreaking as a reasoning problem.
☆26Jul 2, 2025Updated last year
qiuhuachuan / PsyGUARD
View on GitHub
[EMNLP 2024 Oral] PsyGUARD: An Automated System for Suicide Detection and Risk Assessment in Psychological Counseling
☆23Apr 21, 2025Updated last year
SuperFreeMan-CXM / MTL-Using-Uncertainty-to-Weigh-Losses
View on GitHub
☆11May 29, 2020Updated 6 years ago
krafton-ai / lexico
View on GitHub
KV cache compression via sparse coding
☆18Oct 26, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
code-forge-temple / scribe-pal
View on GitHub
ScribePal is an Open Source intelligent browser extension that leverages AI to empower your web experience by providing contextual insigh…
☆21Apr 6, 2026Updated 3 months ago
HPMLL / ZipServ_ASPLOS26
View on GitHub
☆49Dec 19, 2025Updated 6 months ago
MitsuiChen14 / DGTRS
View on GitHub
☆31Jun 10, 2026Updated 3 weeks ago
cefriel / procedural-kg-llm
View on GitHub
Prompt-based pipeline for extracting procedural knowledge graphs from text with LLMs
☆18Feb 17, 2026Updated 4 months ago
BaiTheBest / SRDML
View on GitHub
GitHub Repository for KDD 2022 paper "Saliency-Regularized Deep Multi-Task Learning"
☆12Sep 26, 2023Updated 2 years ago
alisonmitchell / Biomedical-Knowledge-Graph
View on GitHub
Information extraction from unstructured text to build a knowledge graph using techniques from traditional NLP to pre-trained transformer…
☆16May 29, 2026Updated last month
Nayeong-V-Kim / LWBC
View on GitHub
☆21Apr 10, 2023Updated 3 years ago