Multi-Candidate Speculative Decoding
☆39Apr 22, 2024Updated last year
Alternatives and similar repositories for MCSD
Users that are interested in MCSD are comparing it to the libraries listed below
Sorting:
- ☆28May 24, 2025Updated 9 months ago
- Cascade Speculative Drafting☆32Apr 2, 2024Updated last year
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆369Apr 22, 2025Updated 10 months ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆214Sep 11, 2025Updated 5 months ago
- ☆26Mar 14, 2024Updated last year
- ☆11Feb 5, 2026Updated 3 weeks ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆214Feb 13, 2025Updated last year
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆114Mar 20, 2025Updated 11 months ago
- Fast inference from large lauguage models via speculative decoding☆888Aug 22, 2024Updated last year
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,126Jan 24, 2026Updated last month
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- [ACL 2024] Official PyTorch implementation of "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"☆47May 24, 2024Updated last year
- ☆34Aug 23, 2023Updated 2 years ago
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆16Feb 15, 2025Updated last year
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆26Apr 15, 2025Updated 10 months ago
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆31Dec 6, 2023Updated 2 years ago
- This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).☆11May 24, 2024Updated last year
- ☆14Aug 19, 2024Updated last year
- Code for paper "ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection" (MobiSys'23)☆14Nov 1, 2023Updated 2 years ago
- Code for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context☆18Nov 15, 2024Updated last year
- [ICML 2025] Official implementation of the paper "SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling". …☆20Nov 17, 2025Updated 3 months ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆147Dec 23, 2025Updated 2 months ago
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated last year
- ☆64Dec 3, 2024Updated last year
- ☆41Mar 28, 2024Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).☆2,188Feb 20, 2026Updated last week
- ☆66Nov 4, 2024Updated last year
- This repo is to demo the concept of lossless compression with Transformers as encoder and decoder.☆14May 2, 2024Updated last year
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆14Nov 27, 2024Updated last year
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆97Feb 6, 2024Updated 2 years ago
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆277Aug 31, 2024Updated last year
- The official implementation of the DAC 2024 paper GQA-LUT☆20Dec 20, 2024Updated last year
- Scaling Sparse Fine-Tuning to Large Language Models☆18Jan 31, 2024Updated 2 years ago
- Loop Nest - Linear algebra compiler and code generator.☆20Oct 22, 2022Updated 3 years ago
- ☆23Jan 27, 2025Updated last year
- Implementation of "Decoding-time Realignment of Language Models", ICML 2024.☆21Jun 17, 2024Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆266Oct 3, 2025Updated 4 months ago
- ☆235Jun 11, 2024Updated last year
- This repo is reproduction resources for linear alignment paper, still working☆17May 19, 2024Updated last year