Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
☆109Dec 2, 2024Updated last year
Alternatives and similar repositories for Speculative-Decoding
Users that are interested in Speculative-Decoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast inference from large lauguage models via speculative decoding☆915Aug 22, 2024Updated last year
- minimal C implementation of speculative decoding based on llama2.c☆30Jul 15, 2024Updated last year
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,222May 11, 2026Updated 2 weeks ago
- Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).☆57Feb 8, 2026Updated 3 months ago
- Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tu…☆93Mar 6, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆392Apr 22, 2025Updated last year
- Reading list for multimodal sequence learning☆14Sep 4, 2023Updated 2 years ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆227Feb 13, 2025Updated last year
- Code for "The Whole Truth and Nothing But the Truth: Faithful and Controllable Dialogue Response Generation with Dataflow Transduction an…☆10Apr 30, 2024Updated 2 years ago
- A guide to structured generation using constrained decoding☆18Jun 9, 2024Updated last year
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆39Mar 4, 2024Updated 2 years ago
- Serverless LLM Inference: Deploy DeepSeek R1 & LLaMA Models on AWS Lambda with Ultra-Fast Cold Starts☆13Feb 3, 2026Updated 3 months ago
- Single shot neural network pruning before training the model, based on connection sensitivity☆11Aug 7, 2019Updated 6 years ago
- Demo repository for all the different ways to do eBPF Tracing☆17Feb 9, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Demo Repository for eBPF XDP Unit Test☆12Oct 24, 2024Updated last year
- ☆17Jul 11, 2023Updated 2 years ago
- Run Arm64/Linux ELF binaries on macOS Apple Silicon☆122Updated this week
- Implementation of the iPiano algorithm for non-convex and non-smooth optimization as described in [1].☆12Nov 28, 2018Updated 7 years ago
- Bias Mimicking: A simple sampling approach for Bias Mitigation (CVPR 23)☆14Aug 6, 2023Updated 2 years ago
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models☆71May 15, 2025Updated last year
- The Search for Sparse, Robustness Neural Networks☆11Mar 24, 2023Updated 3 years ago
- Implementation of unregularized, l1 regularized and l2 regularized linear regression using numpy and without sklearn☆12Oct 4, 2019Updated 6 years ago
- Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…☆22May 9, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated last year
- ☆15Jun 26, 2024Updated last year
- ☆22Updated this week
- Rhetorical sentence classification using LLMs☆11Oct 26, 2025Updated 7 months ago
- Official Repository for the paper 'MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification' @ EMNLP 2024☆24May 1, 2025Updated last year
- ☆20May 14, 2025Updated last year
- ☆30May 24, 2025Updated last year
- ☆19Feb 18, 2025Updated last year
- KV cache compression via sparse coding☆17Oct 26, 2025Updated 7 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆47Dec 9, 2023Updated 2 years ago
- ☆43Dec 19, 2025Updated 5 months ago
- Prompt-based pipeline for extracting procedural knowledge graphs from text with LLMs☆18Feb 17, 2026Updated 3 months ago
- 3D globe wallpaper for macOS with live weather, flights, pollen, and day/night cycle☆60Feb 14, 2026Updated 3 months ago
- This project leverages advanced AI agents from crewAI to assist doctors in diagnosing medical conditions and recommending treatment plans…☆15Nov 16, 2024Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,337Mar 6, 2025Updated last year
- GPU operators for sparse tensor operations☆37Mar 11, 2024Updated 2 years ago