Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
☆102Dec 2, 2024Updated last year
Alternatives and similar repositories for Speculative-Decoding
Users that are interested in Speculative-Decoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆15Aug 19, 2024Updated last year
- minimal C implementation of speculative decoding based on llama2.c☆28Jul 15, 2024Updated last year
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,163Mar 9, 2026Updated 2 weeks ago
- Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…☆14Dec 16, 2024Updated last year
- DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting☆17Mar 4, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆376Apr 22, 2025Updated 11 months ago
- Reading list for multimodal sequence learning☆14Sep 4, 2023Updated 2 years ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆221Feb 13, 2025Updated last year
- An Open-source Factuality Evaluation Demo for LLMs☆23Feb 23, 2026Updated last month
- This is the official repo for the CVPR 2021 L2ID paper "Distill on the Go: Online knowledge distillation in self-supervised learning"☆12Nov 15, 2021Updated 4 years ago
- This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …☆17Apr 1, 2025Updated 11 months ago
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models☆69May 15, 2025Updated 10 months ago
- ☆15Jun 26, 2024Updated last year
- [IEEE CAL 2025] Accelerating Page Migrations in Operating Systems with Intel DSA☆16Nov 20, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- From-Classification-to-Clinical☆12Apr 26, 2024Updated last year
- Rhetorical sentence classification using LLMs☆11Oct 26, 2025Updated 5 months ago
- ☆17Mar 23, 2023Updated 3 years ago
- ☆14Dec 4, 2020Updated 5 years ago
- ☆28May 24, 2025Updated 10 months ago
- ☆19Feb 18, 2025Updated last year
- [EMNLP 2024] PsyGUARD: An Automated System for Suicide Detection and Risk Assessment in Psychological Counseling☆21Apr 21, 2025Updated 11 months ago
- [Main EMNLP'25] LLMs do Multi-Label Classification Differently☆14Feb 28, 2026Updated 3 weeks ago
- KV cache compression via sparse coding☆17Oct 26, 2025Updated 5 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆21Apr 10, 2023Updated 2 years ago
- GitHub Repository for KDD 2022 paper "Saliency-Regularized Deep Multi-Task Learning"☆12Sep 26, 2023Updated 2 years ago
- Prompt-based pipeline for extracting procedural knowledge graphs from text with LLMs☆16Feb 17, 2026Updated last month
- Information extraction from unstructured text to build a knowledge graph using techniques from traditional NLP to pre-trained transformer…☆16Jan 13, 2026Updated 2 months ago
- Project-based learning tutorials to help you build AI-native applications from scratch.☆30Jan 2, 2026Updated 2 months ago
- ☆14Aug 3, 2024Updated last year
- This project leverages advanced AI agents from crewAI to assist doctors in diagnosing medical conditions and recommending treatment plans…☆14Nov 16, 2024Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,324Mar 6, 2025Updated last year
- GPU operators for sparse tensor operations☆35Mar 11, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- MIPS simulator in C++☆13Oct 2, 2019Updated 6 years ago
- Implementation of a cache memory in verilog☆15Dec 5, 2017Updated 8 years ago
- ITKGrowCut is a remote module for ITK. It segments a 3D image from user-provided foreground and background seeds.☆15Nov 15, 2025Updated 4 months ago
- ☆26Mar 14, 2024Updated 2 years ago
- Tools to simplify life with AI☆30Apr 4, 2025Updated 11 months ago
- ☆17Apr 20, 2025Updated 11 months ago
- Programming for AI Practice☆13Dec 13, 2020Updated 5 years ago