Geralt-Targaryen/Awesome-Speculative-Decoding

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Geralt-Targaryen/Awesome-Speculative-Decoding)

Geralt-Targaryen / Awesome-Speculative-Decoding

Reading notes on Speculative Decoding papers

☆38

Alternatives and similar repositories for Awesome-Speculative-Decoding

Users that are interested in Awesome-Speculative-Decoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rmitbggroup / LearnedIndexDiskExp
View on GitHub
☆14Apr 8, 2023Updated 3 years ago
emperorlu / Learned-RocksDB
View on GitHub
☆16Jul 24, 2023Updated 2 years ago
Geralt-Targaryen / MC-Evaluation
View on GitHub
☆13May 21, 2024Updated 2 years ago
chaohcc / film
View on GitHub
a fully learned index for larger-than-memory databases
☆15Sep 17, 2022Updated 3 years ago
CLR-Lab / SimKO
View on GitHub
SimKO: Simple Pass@K Policy Optimization
☆31Oct 24, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
SDCS-Zyx / SDCS_Zyx
View on GitHub
☆11Feb 15, 2022Updated 4 years ago
alibaba / EfficientAI
View on GitHub
☆48May 9, 2026Updated 2 months ago
NVlabs / RocketKV
View on GitHub
[ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
☆47Aug 7, 2025Updated 11 months ago
schencoding / lits
View on GitHub
LITS: An Optimized Learned Index for Strings
☆13Jun 18, 2025Updated last year
tangg555 / acl-anthology-helper
View on GitHub
To help search, filter, and download papers from 'acl anthology' (https://aclanthology.org/).
☆18Sep 12, 2024Updated last year
zyxxmu / Bi-Mask
View on GitHub
Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"
☆13Jun 7, 2023Updated 3 years ago
Yang011013 / Awesome-Streaming-Video-Understanding
View on GitHub
Awesome latest models, datasets and benchmarks on streaming/online video understanding.
☆30Oct 19, 2025Updated 8 months ago
Toseic / LLM-inference-arxiv-daily
View on GitHub
🎓Automatically Update LLM inference systems Papers Daily using Github Actions (Update Every 12th hours)
☆12Jun 29, 2026Updated last week
Linking-ai / SCOPE
View on GitHub
(ACL2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation
☆36May 28, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
TerryPei / CSP
View on GitHub
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
☆10Dec 15, 2024Updated last year
mshakirDr / MFTE
View on GitHub
MFTE (Multi Feature Tagger of English) Python is the Python version based on Le Foll's MFTE written in Perl. It is extended to include se…
☆31Jun 1, 2026Updated last month
TianyuFan0504 / awesome-spatio-temporal-graph
View on GitHub
This repository contains a list of papers on spatio-temporal graph, especially about GNNs on S-T graph.
☆18Sep 8, 2023Updated 2 years ago
OpenGVLab / LLMPrune-BESA
View on GitHub
BESA is a differentiable weight pruning technique for large language models.
☆17Mar 4, 2024Updated 2 years ago
TUDB-Labs / Awesome-LLM-LoRA
View on GitHub
☆17May 2, 2024Updated 2 years ago
takahiro-hirofuchi / mesmeric-emulator
View on GitHub
MESMERIC: A Software-based NVM Emulator Supporting Read/Write Asymmetric Latencies
☆10Oct 1, 2020Updated 5 years ago
yuny220 / NAR-Former
View on GitHub
Pytorch code of [CVPR 2023] "NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction".
☆11Mar 14, 2023Updated 3 years ago
curtis-sun / TLI
View on GitHub
☆48Jun 10, 2023Updated 3 years ago
YerbaPage / SWE-Debate
View on GitHub
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution [ICSE 2026]
☆33Nov 11, 2025Updated 7 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
AppraiseDev / Appraise
View on GitHub
Appraise code used as part of WMT21 human evaluation campaign
☆30Apr 21, 2026Updated 2 months ago
XiangpengHao / VeryPM
View on GitHub
Persistent Memory Tool Box
☆12Mar 4, 2024Updated 2 years ago
LLMkvsys / rethink-kv-compression
View on GitHub
☆24Mar 7, 2025Updated last year
cares-davinci / MQSim-E
View on GitHub
☆20Apr 18, 2024Updated 2 years ago
zyxxmu / LBC
View on GitHub
Pytorch implementation of our paper accepted by NeurIPS 2022 -- Learning Best Combination for Efficient N:M Sparsity
☆22Jan 13, 2023Updated 3 years ago
schnell18 / lm-quant-toolkit
View on GitHub
LLM Quantization toolkit
☆20Jun 11, 2026Updated 3 weeks ago
peng-weihan / SWE-QA-Bench
View on GitHub
☆54Apr 7, 2026Updated 3 months ago
Scientific-Computing-Lab / STREAMer
View on GitHub
STREAMer: Benchmarking remote volatile and non-volatile memory bandwidth
☆18Aug 21, 2023Updated 2 years ago
PPPP-kaqiu / Awesome-Parallel-Reasoning
View on GitHub
Awesome-Parallel-Reasoning: Unlocking the reasoning potential of LLMs. Papers, Code, Resources & Survey.
☆54Mar 8, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
embryo-labs / Efficient-Disk-Learned-Index
View on GitHub
[SIGMOD’24] Source code for the paper: Making In-Memory Learned Indexes Efficient on Disk
☆14Jun 28, 2024Updated 2 years ago
kuleshov-group / MODULoRA-Experiment
View on GitHub
Evaluation Code repository for the paper "ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers". (2023…
☆13Dec 5, 2023Updated 2 years ago
HobbitQia / 2023_ICS
View on GitHub
☆10Jul 23, 2023Updated 2 years ago
cmd2001 / KVTuner
View on GitHub
[ICML2025] KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
☆29Jan 27, 2026Updated 5 months ago
gre4index / GRE
View on GitHub
GRE is a benchmark suite to compare learned indexes and traditional indexes.
☆55Nov 9, 2022Updated 3 years ago
jlidw / awesome-AI-for-spatial-interpolation-papers
View on GitHub
A professional list of Papers on AI for Spatial Interpolation in AI conferences and journals.
☆19Jul 29, 2024Updated last year
YerbaPage / DetectCodeGPT
View on GitHub
Detection of LLM-Generated Codes [ICSE2025]
☆35Jul 5, 2025Updated last year