This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE"
☆37Oct 5, 2025Updated 5 months ago
Alternatives and similar repositories for Jakiro
Users that are interested in Jakiro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)☆55Mar 14, 2025Updated last year
- ☆25Mar 15, 2023Updated 3 years ago
- BigBang-Proton is a LLM pretrained on cross-scale, cross-structure, cross-discipline real-world scientific tasks to construct a scienti…☆22Nov 8, 2025Updated 4 months ago
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆14Apr 3, 2025Updated 11 months ago
- ☆64Mar 21, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆35Nov 18, 2025Updated 4 months ago
- ☆20Jun 17, 2024Updated last year
- ☆82Updated this week
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆76Jul 14, 2025Updated 8 months ago
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆1,163Mar 9, 2026Updated 2 weeks ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated 2 years ago
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 10 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆68Jun 26, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Aligning Agentic World Models via Knowledgeable Experience Learning☆32Jan 25, 2026Updated 2 months ago
- UQ: Assessing Language Models on Unsolved Questions☆30Aug 26, 2025Updated 7 months ago
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Oct 24, 2024Updated last year
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated 11 months ago
- Rhetorical sentence classification using LLMs☆11Oct 26, 2025Updated 5 months ago
- ☆21Oct 10, 2025Updated 5 months ago
- ☆49Mar 20, 2026Updated last week
- [ICML 2025] DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization☆20May 24, 2025Updated 10 months ago
- LaTeX template for dissertation proposals in Peking University Shenzhen.☆15Feb 23, 2022Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration tech…☆78Nov 4, 2025Updated 4 months ago
- [Main EMNLP'25] LLMs do Multi-Label Classification Differently☆14Feb 28, 2026Updated last month
- Intelligent Resource Requirement Estimation and Scheduling for Deep Learning Jobs on Distributed GPU Clusters☆15Nov 18, 2021Updated 4 years ago
- Reading notes on Speculative Decoding papers☆27Feb 24, 2026Updated last month
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆278Aug 31, 2024Updated last year
- 一个全平台的 Python CPU 性能测试工具及榜单。☆15Nov 11, 2023Updated 2 years ago
- ☆31Jul 21, 2025Updated 8 months ago
- Prompt-based pipeline for extracting procedural knowledge graphs from text with LLMs☆16Feb 17, 2026Updated last month
- Information extraction from unstructured text to build a knowledge graph using techniques from traditional NLP to pre-trained transformer…☆16Jan 13, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- This project leverages advanced AI agents from crewAI to assist doctors in diagnosing medical conditions and recommending treatment plans…☆14Nov 16, 2024Updated last year
- ☆13Jan 14, 2020Updated 6 years ago
- Fast inference from large lauguage models via speculative decoding☆904Aug 22, 2024Updated last year
- Landing page + leaderboard for SWE-Bench benchmark☆12Mar 4, 2026Updated 3 weeks ago
- [ICLR2025] Code and data for paper: Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasonin…☆40Mar 10, 2025Updated last year
- ITKGrowCut is a remote module for ITK. It segments a 3D image from user-provided foreground and background seeds.☆15Nov 15, 2025Updated 4 months ago
- Simulating Distributed Training at Scale☆14Sep 15, 2025Updated 6 months ago