[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆97Feb 6, 2024Updated 2 years ago
Alternatives and similar repositories for BigLittleDecoder
Users that are interested in BigLittleDecoder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Explorations into some recent techniques surrounding speculative decoding☆300Dec 22, 2024Updated last year
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Multi-Candidate Speculative Decoding☆40Apr 22, 2024Updated 2 years ago
- Fast inference from large lauguage models via speculative decoding☆914Aug 22, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆21Sep 10, 2024Updated last year
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆226Feb 13, 2025Updated last year
- Cascade Speculative Drafting☆33Apr 2, 2024Updated 2 years ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆99Aug 20, 2023Updated 2 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆218Mar 5, 2026Updated last month
- Variable-order CRFs with structure learning☆17Aug 1, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆60Nov 20, 2024Updated last year
- ☆26Nov 23, 2023Updated 2 years ago
- scalable and robust tree-based speculative decoding algorithm☆377Jan 28, 2025Updated last year
- ☆13Feb 7, 2023Updated 3 years ago
- Calculating Expected Time for training LLM.☆39Apr 17, 2023Updated 3 years ago
- [NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers☆193Feb 28, 2023Updated 3 years ago
- Open Source Projects from Pallas Lab☆21Oct 10, 2021Updated 4 years ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 3 years ago
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆389Apr 22, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Implementation of QKVAE☆11Feb 24, 2023Updated 3 years ago
- lanmt ebm☆12Jun 19, 2020Updated 5 years ago
- ☆21Feb 5, 2024Updated 2 years ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,727Jun 25, 2024Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆56Feb 24, 2026Updated 2 months ago
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆47Dec 9, 2023Updated 2 years ago
- Cluster-level matrix unit integration into GPUs, implemented in Chipyard SoC☆54Jan 20, 2026Updated 3 months ago
- ☆129Jan 22, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- [ICCAD 2025] Squant☆15Jul 3, 2025Updated 9 months ago
- ☆26May 30, 2023Updated 2 years ago
- ☆17Dec 19, 2024Updated last year
- Bag of Instances Aggregation Boosts Self-supervised Distillation (ICLR 2022)☆33Apr 26, 2022Updated 4 years ago
- Code and Dataset release of "Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models" (NAACL 2024)☆10Oct 16, 2024Updated last year
- A repository for research on medium sized language models.☆78May 23, 2024Updated last year
- source code of COLING2020 "Second-Order Unsupervised Neural Dependency Parsing"☆16Oct 24, 2022Updated 3 years ago