[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆96Feb 6, 2024Updated 2 years ago
Alternatives and similar repositories for BigLittleDecoder
Users that are interested in BigLittleDecoder are comparing it to the libraries listed below
Sorting:
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Multi-Candidate Speculative Decoding☆40Apr 22, 2024Updated last year
- Fast inference from large lauguage models via speculative decoding☆899Aug 22, 2024Updated last year
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆21Sep 10, 2024Updated last year
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆219Feb 13, 2025Updated last year
- Cascade Speculative Drafting☆33Apr 2, 2024Updated last year
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆99Aug 20, 2023Updated 2 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆215Mar 5, 2026Updated 2 weeks ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆57Nov 20, 2024Updated last year
- Variable-order CRFs with structure learning☆17Aug 1, 2024Updated last year
- ☆26Nov 23, 2023Updated 2 years ago
- scalable and robust tree-based speculative decoding algorithm☆372Jan 28, 2025Updated last year
- ☆11Oct 11, 2023Updated 2 years ago
- ☆13Feb 7, 2023Updated 3 years ago
- Calculating Expected Time for training LLM.☆38Apr 17, 2023Updated 2 years ago
- [NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers☆192Feb 28, 2023Updated 3 years ago
- Open Source Projects from Pallas Lab☆21Oct 10, 2021Updated 4 years ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- Implementation of QKVAE☆11Feb 24, 2023Updated 3 years ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆372Apr 22, 2025Updated 11 months ago
- lanmt ebm☆12Jun 19, 2020Updated 5 years ago
- Cluster-level matrix unit integration into GPUs, implemented in Chipyard SoC☆50Jan 20, 2026Updated 2 months ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- ☆21Feb 5, 2024Updated 2 years ago
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,719Jun 25, 2024Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆54Feb 24, 2026Updated 3 weeks ago
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆46Dec 9, 2023Updated 2 years ago
- ☆129Jan 22, 2024Updated 2 years ago
- [ICCAD 2025] Squant☆15Jul 3, 2025Updated 8 months ago
- ☆17Dec 19, 2024Updated last year
- ☆26May 30, 2023Updated 2 years ago
- Bag of Instances Aggregation Boosts Self-supervised Distillation (ICLR 2022)☆33Apr 26, 2022Updated 3 years ago
- Code and Dataset release of "Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models" (NAACL 2024)☆10Oct 16, 2024Updated last year
- A repository for research on medium sized language models.☆78May 23, 2024Updated last year
- source code of COLING2020 "Second-Order Unsupervised Neural Dependency Parsing"☆16Oct 24, 2022Updated 3 years ago