This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE" [ACL 2026 Main Accepted]
☆38Oct 5, 2025Updated 7 months ago
Alternatives and similar repositories for Jakiro
Users that are interested in Jakiro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)☆57Mar 14, 2025Updated last year
- ☆25Mar 15, 2023Updated 3 years ago
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆14Apr 3, 2025Updated last year
- ☆37Nov 18, 2025Updated 5 months ago
- ☆20Jun 17, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆16Jul 31, 2025Updated 9 months ago
- ☆45May 27, 2025Updated 11 months ago
- Official Implementation of LANTERN (ICLR'25) and LANTERN++(ICLRW-SCOPE'25)☆19Mar 5, 2025Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated 2 years ago
- (ACL2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆35May 28, 2025Updated 11 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).☆2,313Feb 20, 2026Updated 2 months ago
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆68Jun 26, 2024Updated last year
- Aligning Agentic World Models via Knowledgeable Experience Learning☆32Jan 25, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- UQ: Assessing Language Models on Unsolved Questions☆30Aug 26, 2025Updated 8 months ago
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Oct 24, 2024Updated last year
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated last year
- GPU topology-aware scheduler☆13Jul 7, 2017Updated 8 years ago
- ☆26Dec 5, 2022Updated 3 years ago
- ☆50Mar 20, 2026Updated last month
- LaTeX template for dissertation proposals in Peking University Shenzhen.☆15Feb 23, 2022Updated 4 years ago
- An ITK implementation of the GraphCut framework. See 'Graph cuts and efficient ND image segmentation' by Boykov and Funka-Lea and 'Intera…☆12Sep 18, 2017Updated 8 years ago
- Intelligent Resource Requirement Estimation and Scheduling for Deep Learning Jobs on Distributed GPU Clusters☆15Nov 18, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Reading notes on Speculative Decoding papers☆32Apr 16, 2026Updated 3 weeks ago
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆279Aug 31, 2024Updated last year
- A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration tech…☆81Nov 4, 2025Updated 6 months ago
- ☆30Jul 21, 2025Updated 9 months ago
- Prompt-based pipeline for extracting procedural knowledge graphs from text with LLMs☆18Feb 17, 2026Updated 2 months ago
- Landing page + leaderboard for SWE-Bench benchmark☆12Mar 29, 2026Updated last month
- Fast inference from large lauguage models via speculative decoding☆914Aug 22, 2024Updated last year
- Simulating Distributed Training at Scale☆14Sep 15, 2025Updated 7 months ago
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths☆18Jul 10, 2025Updated 9 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆13Mar 6, 2023Updated 3 years ago
- ☆27Jun 22, 2024Updated last year
- [TBD] "m4: A Learned Flow-level Network Simulator" by Chenning Li, Anton A. Zabreyko, Om Chabra, Arash Nasr-Esfahany, Kevin Zhao, Pratees…☆18Apr 27, 2026Updated last week
- AI-powered system generating knowledge graphs from text and answering questions.☆25Oct 25, 2024Updated last year
- Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification☆17Jul 13, 2025Updated 9 months ago
- PyTorch-UVM on super-large language models.☆17Dec 21, 2020Updated 5 years ago
- Converting unstructured text to knowledge graph using spacy LLM and ne04j.☆17Feb 1, 2024Updated 2 years ago