kssteven418/BigLittleDecoder

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kssteven418/BigLittleDecoder)

kssteven418 / BigLittleDecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

☆99

Alternatives and similar repositories for BigLittleDecoder

Users that are interested in BigLittleDecoder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lucidrains / speculative-decoding
View on GitHub
Explorations into some recent techniques surrounding speculative decoding
☆307Dec 22, 2024Updated last year
Yuanhy1997 / HyPe
View on GitHub
HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]
☆14Jul 11, 2023Updated 3 years ago
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
NJUNLP / MCSD
View on GitHub
Multi-Candidate Speculative Decoding
☆41Apr 22, 2024Updated 2 years ago
Infini-AI-Lab / Sirius
View on GitHub
Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…
☆21Sep 10, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
feifeibear / LLMSpeculativeSampling
View on GitHub
Fast inference from large lauguage models via speculative decoding
☆920Aug 22, 2024Updated last year
dilab-zju / self-speculative-decoding
View on GitHub
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆230Feb 13, 2025Updated last year
lfsszd / CS-Drafting
View on GitHub
Cascade Speculative Drafting
☆33Apr 2, 2024Updated 2 years ago
OpenNLPLab / ETSC-Exact-Toeplitz-to-SSM-Conversion
View on GitHub
[EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…
☆14Oct 17, 2023Updated 2 years ago
jaymody / speculative-sampling
View on GitHub
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆99Aug 20, 2023Updated 2 years ago
da03 / criticize_text_generation
View on GitHub
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆12Mar 18, 2023Updated 3 years ago
raymin0223 / fast_robust_early_exit
View on GitHub
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆67Sep 28, 2024Updated last year
FasterDecoding / REST
View on GitHub
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆220Mar 5, 2026Updated 4 months ago
timvieira / vocrf
View on GitHub
Variable-order CRFs with structure learning
☆17Aug 1, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
prateeky2806 / ComPEFT
View on GitHub
☆26Nov 23, 2023Updated 2 years ago
Infini-AI-Lab / Sequoia
View on GitHub
scalable and robust tree-based speculative decoding algorithm
☆376Jan 28, 2025Updated last year
Doraemonzzz / nanoTransNormer
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
emorynlp / seq2seq-corenlp
View on GitHub
☆13Feb 7, 2023Updated 3 years ago
jason9693 / ETA4LLMs
View on GitHub
Calculating Expected Time for training LLM.
☆39Apr 17, 2023Updated 3 years ago
WoosukKwon / retraining-free-pruning
View on GitHub
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
☆197Feb 28, 2023Updated 3 years ago
Ryu1845 / hyena-jax
View on GitHub
Implementation of Hyena Hierarchy in JAX
☆10Apr 30, 2023Updated 3 years ago
machelreid / editpro
View on GitHub
Learning to Model Editing Processes
☆26Aug 3, 2025Updated 11 months ago
ghazi-f / QKVAE
View on GitHub
Implementation of QKVAE
☆11Feb 24, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hemingkx / Spec-Bench
View on GitHub
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆401Apr 22, 2025Updated last year
zomux / lanmt-ebm
View on GitHub
lanmt ebm
☆12Jun 19, 2020Updated 6 years ago
kssteven418 / SqueezeLLM-gradients
View on GitHub
☆21Feb 5, 2024Updated 2 years ago
srush / tangent
View on GitHub
Source-to-Source Debuggable Derivatives in Pure Python
☆15Jan 23, 2024Updated 2 years ago
hemingkx / SpecDec
View on GitHub
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
☆47Dec 9, 2023Updated 2 years ago
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,757Jun 25, 2024Updated 2 years ago
AlirezaMorsali / MLP-Attention
View on GitHub
☆17Dec 19, 2024Updated last year
HanGuo97 / lq-lora
View on GitHub
☆129Jan 22, 2024Updated 2 years ago
shawnricecake / squant
View on GitHub
[ICCAD 2025] Squant
☆15Jul 3, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SqueezeAILab / SqueezedAttention
View on GitHub
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
☆58Nov 20, 2024Updated last year
ctlllll / reward_collapse
View on GitHub
☆26May 30, 2023Updated 3 years ago
TRI-ML / linear_open_lm
View on GitHub
A repository for research on medium sized language models.
☆78May 23, 2024Updated 2 years ago
haohang96 / bingo
View on GitHub
Bag of Instances Aggregation Boosts Self-supervised Distillation (ICLR 2022)
☆33Apr 26, 2022Updated 4 years ago
thunlp / Modularity-Analysis
View on GitHub
[ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers
☆26Jun 7, 2023Updated 3 years ago
kimyuji / EvolvingQA_benchmark
View on GitHub
Code and Dataset release of "Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models" (NAACL 2024)
☆10Oct 16, 2024Updated last year
sustcsonglin / second-order-neural-dmv
View on GitHub
source code of COLING2020 "Second-Order Unsupervised Neural Dependency Parsing"
☆16Oct 24, 2022Updated 3 years ago