jaymody/speculative-sampling

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jaymody/speculative-sampling)

jaymody / speculative-sampling

Simple implementation of Speculative Sampling in NumPy for GPT-2.

☆99

Alternatives and similar repositories for speculative-sampling

Users that are interested in speculative-sampling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lucidrains / speculative-decoding
View on GitHub
Explorations into some recent techniques surrounding speculative decoding
☆307Dec 22, 2024Updated last year
fattorib / Little-GPT
View on GitHub
GPT* - Training faster small transformers using ALiBi, Parallel Residual Connections and more!
☆20Oct 29, 2022Updated 3 years ago
feifeibear / LLMSpeculativeSampling
View on GitHub
Fast inference from large lauguage models via speculative decoding
☆922Aug 22, 2024Updated last year
dilab-zju / self-speculative-decoding
View on GitHub
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆230Feb 13, 2025Updated last year
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆401Apr 22, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
casetext / r-and-r
View on GitHub
Code for the "Long Context Needs Some R&R" paper.
☆12Mar 11, 2024Updated 2 years ago
hyunwoongko / beyond-lm
View on GitHub
Beyond LM: How can language model go forward in the future?
☆15Apr 30, 2023Updated 3 years ago
yzhaiustc / Optimizing-SGEMV-on-NVIDIA-GPUs
View on GitHub
An implementation of SGEMV with performance comparable to cuBLAS.
☆12May 21, 2021Updated 5 years ago
awslabs / extending-the-context-length-of-open-source-llms
View on GitHub
☆56Jun 26, 2025Updated last year
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,758Jun 25, 2024Updated 2 years ago
haiduo / Jakiro
View on GitHub
This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE" [ACL 2026 Mai…
☆37Oct 5, 2025Updated 9 months ago
Langboat / mengzi-retrieval-lm
View on GitHub
An experimental implementation of the retrieval-enhanced language model
☆74Dec 29, 2022Updated 3 years ago
pacman100 / peft-codegen-25
View on GitHub
☆23Jul 10, 2023Updated 3 years ago
apoorvumang / prompt-lookup-decoding
View on GitHub
Simple speculative decoding technique, integrated in vLLM and transformers
☆611Aug 23, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hunkim / chatgpt-askup-search-plugin
View on GitHub
AskUp Search ChatGPT Plugin
☆20May 27, 2023Updated 3 years ago
jason9693 / ETA4LLMs
View on GitHub
Calculating Expected Time for training LLM.
☆39Apr 17, 2023Updated 3 years ago
ttumiel / minRLHF
View on GitHub
Minimal RLHF implementation built on top of minGPT.
☆32Jul 4, 2024Updated 2 years ago
SqueezeAILab / SqueezeLLM
View on GitHub
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆722Aug 13, 2024Updated last year
hao-ai-lab / LookaheadDecoding
View on GitHub
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,340Mar 6, 2025Updated last year
Noahs-ARK / RFA
View on GitHub
☆33Apr 12, 2021Updated 5 years ago
bespokelabsai / verifiers
View on GitHub
Verifiers for LLM Reinforcement Learning
☆81Jul 17, 2026Updated last week
DTennant / distill_visual_priors
View on GitHub
2nd place solution of ECCV 2020 workshop VIPriors Image Classification Challenge, https://arxiv.org/abs/2008.00261
☆13Aug 22, 2021Updated 4 years ago
monologg / korean-hate-speech-koelectra
View on GitHub
Bias, Hate classification with KoELECTRA 👿
☆27Jun 12, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
AIRC-KETI / ke-t5-downstreams
View on GitHub
☆39Mar 25, 2024Updated 2 years ago
SafeAILab / EAGLE
View on GitHub
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
☆2,478Feb 20, 2026Updated 5 months ago
simonjisu / annotated-transformer-kr
View on GitHub
annotated-transformer-kr
☆15May 16, 2019Updated 7 years ago
jxpress / setfit-pytorch-lightning
View on GitHub
☆44Apr 22, 2026Updated 3 months ago
romsto / Speculative-Decoding
View on GitHub
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
☆112Dec 2, 2024Updated last year
nayohan / SimKoR
View on GitHub
[HCLT 2022] Korean sentence text similarity dataset using naver shopping review
☆25Oct 20, 2022Updated 3 years ago
NVIDIA / Star-Attention
View on GitHub
Efficient LLM Inference over Long Sequences
☆392Jun 25, 2025Updated last year
naver-ai / carecall-corpus
View on GitHub
CareCall for Seniors: Role Specified Open-Domain Dialogue dataset generated by leveraging LLMs (NAACL 2022).
☆62May 3, 2022Updated 4 years ago
deep-spin / OpenNMT-entmax
View on GitHub
☆15May 14, 2019Updated 7 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
SimiaoZuo / MoEBERT
View on GitHub
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆114May 2, 2022Updated 4 years ago
yifeiwang77 / Self-Correction
View on GitHub
☆20Nov 3, 2024Updated last year
Macielyoung / Confused_Chinese
View on GitHub
Fetching confused chars, including same pronunciation, similar pronunciation and similar character pattern
☆21Jan 20, 2023Updated 3 years ago
DOUDOU0314 / GPT-J-hf
View on GitHub
GPT-jax based on the official huggingface library
☆13Jun 22, 2021Updated 5 years ago
seopbo / nlp_tutorials
View on GitHub
huggingface를 이용하여 downstream task 수행하기
☆62Dec 28, 2021Updated 4 years ago
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,281Jun 27, 2026Updated 3 weeks ago
XinyuHua / pair-emnlp2020
View on GitHub
Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"
☆31Apr 17, 2021Updated 5 years ago