Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
☆106Dec 2, 2024Updated last year
Alternatives and similar repositories for Speculative-Decoding
Users that are interested in Speculative-Decoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Explorations into some recent techniques surrounding speculative decoding☆300Dec 22, 2024Updated last year
- ☆16Aug 19, 2024Updated last year
- Fast inference from large lauguage models via speculative decoding☆914Aug 22, 2024Updated last year
- ☆12Sep 30, 2024Updated last year
- minimal C implementation of speculative decoding based on llama2.c☆29Jul 15, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,206Apr 18, 2026Updated 2 weeks ago
- Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…☆14Dec 16, 2024Updated last year
- Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).☆55Feb 8, 2026Updated 2 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆99Aug 20, 2023Updated 2 years ago
- [ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vis…☆25Jul 21, 2024Updated last year
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆391Apr 22, 2025Updated last year
- ☆36Oct 21, 2025Updated 6 months ago
- Reading list for multimodal sequence learning☆14Sep 4, 2023Updated 2 years ago
- A Demo of Running Sleep-time Compute to Reduce LLM Latency☆16May 17, 2025Updated 11 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.☆16Jun 28, 2024Updated last year
- Single shot neural network pruning before training the model, based on connection sensitivity☆11Aug 7, 2019Updated 6 years ago
- Bias Mimicking: A simple sampling approach for Bias Mitigation (CVPR 23)☆14Aug 6, 2023Updated 2 years ago
- This is the official repo for the CVPR 2021 L2ID paper "Distill on the Go: Online knowledge distillation in self-supervised learning"☆12Nov 15, 2021Updated 4 years ago
- Flutter SampleApp" is an ideal beginning for your Flutter app development journey. Explore Flutter's potential with lab tutorials and pra…☆12Aug 11, 2025Updated 8 months ago
- This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …☆17Apr 1, 2025Updated last year
- RATT: A Thought Structure for Coherent and Correct LLM Reasoning☆16Jul 11, 2024Updated last year
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆83Sep 15, 2025Updated 7 months ago
- [BMVC 2022] Information Theoretic Representation Distillation☆19Oct 6, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The Search for Sparse, Robustness Neural Networks☆11Mar 24, 2023Updated 3 years ago
- Artifact package for CBMM paper (ATC'22)☆11Jun 5, 2022Updated 3 years ago
- Implementation of unregularized, l1 regularized and l2 regularized linear regression using numpy and without sklearn☆12Oct 4, 2019Updated 6 years ago
- ☆15Jun 26, 2024Updated last year
- ☆22Apr 24, 2026Updated last week
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- [IEEE CAL 2025] Accelerating Page Migrations in Operating Systems with Intel DSA☆16Nov 20, 2024Updated last year
- ☆34Oct 13, 2025Updated 6 months ago
- ☆20May 14, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆29May 24, 2025Updated 11 months ago
- ☆19Feb 18, 2025Updated last year
- An ITK implementation of the GraphCut framework. See 'Graph cuts and efficient ND image segmentation' by Boykov and Funka-Lea and 'Intera…☆12Sep 18, 2017Updated 8 years ago
- A new algorithm that formulates jailbreaking as a reasoning problem.☆26Jul 2, 2025Updated 10 months ago
- ScribePal is an Open Source intelligent browser extension that leverages AI to empower your web experience by providing contextual insigh…☆21Apr 6, 2026Updated last month
- ☆40Dec 19, 2025Updated 4 months ago
- GitHub Repository for KDD 2022 paper "Saliency-Regularized Deep Multi-Task Learning"☆12Sep 26, 2023Updated 2 years ago