Cascade Speculative Drafting
☆32Apr 2, 2024Updated last year
Alternatives and similar repositories for CS-Drafting
Users that are interested in CS-Drafting are comparing it to the libraries listed below
Sorting:
- Multi-Candidate Speculative Decoding☆39Apr 22, 2024Updated last year
- ☆23Jan 27, 2025Updated last year
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆214Sep 11, 2025Updated 5 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆114Mar 20, 2025Updated 11 months ago
- ☆21Jul 21, 2025Updated 7 months ago
- Fork of Flame repo for training of some new stuff in development☆19Feb 20, 2026Updated last week
- ☆11Feb 5, 2026Updated 3 weeks ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆214Feb 13, 2025Updated last year
- [WIP] AI Try-On plugin for Chrome☆28Mar 16, 2024Updated last year
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆97Feb 6, 2024Updated 2 years ago
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆31May 7, 2024Updated last year
- ☆32Jan 1, 2024Updated 2 years ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆25Feb 13, 2026Updated 2 weeks ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆65Sep 28, 2024Updated last year
- ☆129Jan 22, 2024Updated 2 years ago
- ☆68Aug 16, 2024Updated last year
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆33Jun 2, 2023Updated 2 years ago
- Official Implementation of APB (ACL 2025 main Oral) and Spava.☆34Jan 30, 2026Updated last month
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆64Feb 19, 2026Updated last week
- CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training☆32Jul 20, 2022Updated 3 years ago
- Efficient Memory-Augmented Transformers☆35Dec 5, 2022Updated 3 years ago
- Implementation of DoRA☆308Jun 7, 2024Updated last year
- ☆596Aug 23, 2024Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆40Feb 5, 2024Updated 2 years ago
- Martingale posterior neural networks for fast sequential decision making @ Neurips 2025☆23Nov 13, 2025Updated 3 months ago
- Code for "APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training"☆38Dec 23, 2025Updated 2 months ago
- ☆18Dec 30, 2025Updated 2 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆358Feb 5, 2026Updated 3 weeks ago
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆85Oct 18, 2023Updated 2 years ago
- scalable and robust tree-based speculative decoding algorithm☆370Jan 28, 2025Updated last year
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,126Jan 24, 2026Updated last month
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆94Nov 17, 2024Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆163Apr 13, 2025Updated 10 months ago
- A virtual clinical environment for self‑evolving LLM diagnostic agents.☆94Feb 12, 2026Updated 2 weeks ago
- Code repository for Black Mamba☆262Feb 8, 2024Updated 2 years ago
- ☆14Mar 20, 2025Updated 11 months ago
- [ACL 2025] Knowledge Unlearning for Large Language Models☆48Sep 18, 2025Updated 5 months ago
- Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference☆45Nov 28, 2022Updated 3 years ago
- ☆10Nov 17, 2022Updated 3 years ago