Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
☆110Dec 2, 2024Updated last year
Alternatives and similar repositories for Speculative-Decoding
Users that are interested in Speculative-Decoding are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Explorations into some recent techniques surrounding speculative decoding☆305Dec 22, 2024Updated last year
- ☆16Aug 19, 2024Updated last year
- Fast inference from large lauguage models via speculative decoding☆916Aug 22, 2024Updated last year
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,253Jun 2, 2026Updated 2 weeks ago
- Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…☆14Dec 16, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting☆18Mar 4, 2025Updated last year
- Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).☆60Feb 8, 2026Updated 4 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆99Aug 20, 2023Updated 2 years ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆111Feb 29, 2024Updated 2 years ago
- Code release for RICA^2: Rubric-Informed, Calibrated Assessment of Actions (ECCV 2024)☆15Nov 9, 2025Updated 7 months ago
- Security-native LLM system for AI-generated application security.☆263Jun 4, 2026Updated last week
- 北京邮电大学求职仓库--持续更新☆23Sep 5, 2025Updated 9 months ago
- PyTorch implementation of the paper: Decomposing Vision Transformers for Collaborative Inference in Edge Devices☆18Jul 27, 2024Updated last year
- ☆40Oct 21, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Reading list for multimodal sequence learning☆14Sep 4, 2023Updated 2 years ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆227Feb 13, 2025Updated last year
- This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.☆16Jun 28, 2024Updated last year
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆39Mar 4, 2024Updated 2 years ago
- Serverless LLM Inference: Deploy DeepSeek R1 & LLaMA Models on AWS Lambda with Ultra-Fast Cold Starts☆13Feb 3, 2026Updated 4 months ago
- Demo repository for all the different ways to do eBPF Tracing☆18Feb 9, 2026Updated 4 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆111Nov 22, 2025Updated 6 months ago
- RATT: A Thought Structure for Coherent and Correct LLM Reasoning☆15Jul 11, 2024Updated last year
- Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models☆32Oct 6, 2025Updated 8 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 高性能短序列稀疏Mask Attention CUDA算子,针对<1K序列+75%稀疏度优化☆79Mar 18, 2026Updated 2 months ago
- [BMVC 2022] Information Theoretic Representation Distillation☆19Oct 6, 2023Updated 2 years ago
- The Search for Sparse, Robustness Neural Networks☆11Mar 24, 2023Updated 3 years ago
- Implementation of unregularized, l1 regularized and l2 regularized linear regression using numpy and without sklearn☆11Oct 4, 2019Updated 6 years ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆86Sep 15, 2025Updated 9 months ago
- PhysPatch performs physical memory scanning and patching of the entire Windows Kernel using DMA☆13Nov 10, 2024Updated last year
- ☆15Jun 26, 2024Updated last year
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated last year
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [IEEE CAL 2025] Accelerating Page Migrations in Operating Systems with Intel DSA☆16Nov 20, 2024Updated last year
- Rust library for conversion to/from proquints☆14Feb 9, 2018Updated 8 years ago
- Run Arm64/x86-64 Linux ELF binaries on macOS Apple Silicon☆155Updated this week
- Meet Rustacean GPT, an experimental project transforming OpenAi's GPT into a helpful, autonomous software engineer to support senior deve…☆15May 10, 2023Updated 3 years ago
- Official Repository for the paper 'MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification' @ EMNLP 2024☆24May 1, 2025Updated last year
- ☆18Mar 23, 2023Updated 3 years ago
- Fluent CLI is an advanced command-line interface designed to interact seamlessly with multiple workflow systems like FlowiseAI, Langflow,…☆34Jan 23, 2026Updated 4 months ago