HArmonizedSS/HASS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HArmonizedSS/HASS)

HArmonizedSS / HASS

Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)

☆56

Alternatives and similar repositories for HASS

Users that are interested in HASS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hsj576 / GRIFFIN
View on GitHub
Official Implementation of "GRIFFIN: Effective Token Alignment for Faster Speculative Decoding"[NeurIPS 2025]
☆19May 12, 2025Updated last year
thunlp / FR-Spec
View on GitHub
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
☆55Jul 15, 2025Updated last year
SafeAILab / EAGLE
View on GitHub
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
☆2,471Feb 20, 2026Updated 5 months ago
hyx1999 / SAM-Decoding
View on GitHub
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
☆52May 12, 2026Updated 2 months ago
NonvolatileMemory / GliDe_with_a_CaPE_ICML_24
View on GitHub
official code for GliDe with a CaPE
☆22Aug 13, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
NickL77 / BaldEagle
View on GitHub
3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding
☆85Jul 3, 2025Updated last year
zhzihao / Learning-to-Draft
View on GitHub
Official implementation of "Learning To Draft: Adaptive Speculative Decoding with Reinforcement Learning" (ICLR 2026)
☆19Mar 1, 2026Updated 4 months ago
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆401Apr 22, 2025Updated last year
haiduo / Jakiro
View on GitHub
This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE" [ACL 2026 Mai…
☆37Oct 5, 2025Updated 9 months ago
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,276Jun 27, 2026Updated 3 weeks ago
Jikai0Wang / OPT-Tree
View on GitHub
☆30May 24, 2025Updated last year
KangJialiang / ViSpec
View on GitHub
[NeurIPS 2025] Official Implementation of ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding.
☆63Jan 28, 2026Updated 5 months ago
joonkeekim / Instructive-Decoding
View on GitHub
Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…
☆21Mar 7, 2024Updated 2 years ago
garipovroma / autojudge
View on GitHub
[NeurIPS 2025] Official PyTorch implementation for the paper AutoJudge: Judge Decoding Without Manual Annotation
☆21Dec 22, 2025Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sgl-project / SpecForge
View on GitHub
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆997Updated this week
smart-lty / ParallelSpeculativeDecoding
View on GitHub
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆169Dec 23, 2025Updated 6 months ago
AMD-AGI / Gumiho
View on GitHub
Official Implementation of "Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding" (ICML'25)
☆36May 14, 2026Updated 2 months ago
GATECH-EIC / Linearized-LLM
View on GitHub
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Jun 12, 2024Updated 2 years ago
LiuXiaoxuanPKU / OSD
View on GitHub
☆68Dec 3, 2024Updated last year
ruipeterpan / specreason
View on GitHub
PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [NeurIPS '25]
☆74Oct 2, 2025Updated 9 months ago
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆210Mar 18, 2026Updated 4 months ago
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,757Jun 25, 2024Updated 2 years ago
dilab-zju / self-speculative-decoding
View on GitHub
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆230Feb 13, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Jingyu6 / speculative_prefill
View on GitHub
☆63May 19, 2025Updated last year
ZhouYuxuanYX / Hierarchical-Speculative-Decoding
View on GitHub
Hierarchical Speculative Decoding is the SOTA verification algorithm for lossless accelerated LLM inference.
☆24Apr 14, 2026Updated 3 months ago
FasterDecoding / REST
View on GitHub
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆220Mar 5, 2026Updated 4 months ago
Lyn-Lucy / MSD
View on GitHub
☆37Jul 21, 2025Updated last year
d-matrix-ai / keyformer-llm
View on GitHub
Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning
☆57Mar 26, 2024Updated 2 years ago
UNITES-Lab / MoE-Quantization
View on GitHub
Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"
☆30Jun 30, 2025Updated last year
SeanLeng1 / CrossWordBench
View on GitHub
☆12Apr 18, 2025Updated last year
google / iopddl
View on GitHub
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆25May 12, 2025Updated last year
Jiaju-Chen / UpliftRec
View on GitHub
this is a work about UpliftRec
☆10Dec 10, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xgi-org / xgi-data
View on GitHub
Standardized higher-order datasets with corresponding datasheets
☆23Jun 16, 2026Updated last month
Dogacel / Attention-Drift
View on GitHub
Code for the paper *Attention Drift: What Speculative Decoding Models Learn*.
☆27May 12, 2026Updated 2 months ago
HandH1998 / QQQ
View on GitHub
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆157Aug 21, 2025Updated 11 months ago
mit-han-lab / duo-attention
View on GitHub
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆539Feb 10, 2025Updated last year
Equationliu / Kangaroo
View on GitHub
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆72Jun 26, 2024Updated 2 years ago
gmlwns2000 / sea-attention
View on GitHub
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
☆12Jun 20, 2025Updated last year
rahulguptakota / paper-To-Reviewer-Matching-System
View on GitHub
Paper to Reviewer Assignment is a tedious but a very crucial job for conference organizers. Till date the Toronto Paper Matching System (…
☆10Nov 30, 2017Updated 8 years ago