yuezhouhu/adaspec

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuezhouhu/adaspec)

yuezhouhu / adaspec

A selective knowledge distillation algorithm for efficient speculative decoders

☆39

Alternatives and similar repositories for adaspec

Users that are interested in adaspec are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yuezhouhu / residual-context-diffusion
View on GitHub
[ICML 2026] Residual Context Diffusion (RCD): Repurposing discarded signals as structured priors for high-performance reasoning in dLLMs.
☆58Jun 28, 2026Updated 3 weeks ago
yuezhouhu / 2by4-pretrain
View on GitHub
Efficient 2:4 sparse training algorithms and implementations
☆62Dec 8, 2024Updated last year
AMD-AGI / PARD
View on GitHub
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation (ICLR 26)
☆33Jun 10, 2026Updated last month
thu-ml / Jetfire-INT8Training
View on GitHub
☆63Jul 21, 2024Updated 2 years ago
sth1997 / GraphSet
View on GitHub
☆10Nov 28, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
LiuXiaoxuanPKU / OSD
View on GitHub
☆68Dec 3, 2024Updated last year
Adlik / model_zoo
View on GitHub
☆11Dec 26, 2025Updated 6 months ago
zhzihao / Learning-to-Draft
View on GitHub
Official implementation of "Learning To Draft: Adaptive Speculative Decoding with Reinforcement Learning" (ICLR 2026)
☆20Mar 1, 2026Updated 4 months ago
smart-lty / nano-PEARL
View on GitHub
Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.
☆211Mar 18, 2026Updated 4 months ago
svg-project / Quant-VideoGen
View on GitHub
[ICML2026] Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization
☆61Jun 4, 2026Updated last month
cofe-ai / Mu-scaling
View on GitHub
Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales
☆32Jul 17, 2023Updated 3 years ago
JiangLiSJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
NYCU-EDgeAi / subspec
View on GitHub
[NeurIPS 2025] Speculate Deep and Accurate
☆22Jan 16, 2026Updated 6 months ago
coder-qicao / DreamPRM-1.5
View on GitHub
We introduce DreamPRM-1.5, an instance-reweighted framework that adaptively adjusts the importance of each training example via bi-level …
☆16Nov 13, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
real-absolute-AI / RAPID
View on GitHub
[ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding
☆23Mar 2, 2025Updated last year
Bruce-Lee-LY / decoding_attention
View on GitHub
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
☆47Jun 11, 2025Updated last year
Infini-AI-Lab / vortex_torch
View on GitHub
Vortex: Programmable Sparse Attention for Agents as Algorithm Designers
☆67Jun 24, 2026Updated last month
thu-ml / TetraJet-MXFP4Training
View on GitHub
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆40May 4, 2026Updated 2 months ago
killthefullmoon / MMSpec
View on GitHub
MMSpec: Benchmarking Speculative Decoding for Vision-Language Models
☆41Jul 2, 2026Updated 3 weeks ago
jwkirchenbauer / mtp-lm
View on GitHub
Source code to accompany research paper on training multi token prediction language models using self-distillation.
☆39Feb 21, 2026Updated 5 months ago
hao-ai-lab / Dynasor
View on GitHub
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆232May 31, 2025Updated last year
ruikangliu / Quantized-Reasoning-Models
View on GitHub
[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"
☆77Jul 8, 2025Updated last year
JTWang2000 / NICE
View on GitHub
NICE: Non-differentiable evaluation metric-based InfluenCe Estimation
☆16Jul 7, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
AkideLiu / MiniCache
View on GitHub
☆14Sep 7, 2024Updated last year
electron-shaders / MineDraft
View on GitHub
☆38Jun 23, 2026Updated last month
Kaffaljidhmah2 / SpecDec_pp
View on GitHub
Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
☆19Jul 10, 2025Updated last year
SqueezeAILab / CDLM
View on GitHub
CDLM: Consistency Diffusion Language Models for Faster Sampling
☆41Nov 25, 2025Updated 8 months ago
thunlp / KG-Infused-RAG
View on GitHub
Official implementation for the paper "KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs"
☆24Jan 18, 2026Updated 6 months ago
nbasyl / OFQ
View on GitHub
The official implementation of the ICML 2023 paper OFQ-ViT
☆39Oct 3, 2023Updated 2 years ago
hsj576 / GRIFFIN
View on GitHub
Official Implementation of "GRIFFIN: Effective Token Alignment for Faster Speculative Decoding"[NeurIPS 2025]
☆19May 12, 2025Updated last year
thu-ml / SLA
View on GitHub
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
☆324Feb 24, 2026Updated 5 months ago
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
coder-qicao / DreamPRM
View on GitHub
DreamPRM tackles the dataset quality imbalance and distribution shift that plague multimodal PRM training by domain-reweighting.
☆24Sep 6, 2025Updated 10 months ago
Infini-AI-Lab / MagicDec
View on GitHub
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆155Dec 4, 2024Updated last year
naimengye / speculative-action
View on GitHub
☆30Mar 9, 2026Updated 4 months ago
Infini-AI-Lab / gsm_infinite
View on GitHub
☆65Jun 12, 2025Updated last year
mit-han-lab / x-attention
View on GitHub
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆280Jul 6, 2025Updated last year
Zengwh02 / GlimpRouter
View on GitHub
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts
☆16Apr 24, 2026Updated 3 months ago
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆401Apr 22, 2025Updated last year