haiduo/Jakiro

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/haiduo/Jakiro)

haiduo / Jakiro

This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE" [ACL 2026 Main Accepted]

☆37

Alternatives and similar repositories for Jakiro

Users that are interested in Jakiro are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hgyhungry / alcop-artifact
View on GitHub
☆25Mar 15, 2023Updated 3 years ago
GAIR-NLP / LIMOPro
View on GitHub
☆15May 27, 2025Updated last year
shixun404 / Fault-Tolerant-SGEMM-on-NVIDIA-GPUs
View on GitHub
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
☆14Apr 3, 2025Updated last year
hemingkx / SpeculativeDecodingPapers
View on GitHub
📰 Must-read papers and blogs on Speculative Decoding ⚡️
☆1,283Jun 27, 2026Updated last month
sail-sg / LongSpec
View on GitHub
[ACL 2026 (Main)] LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆84Jul 14, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
TuringEyeTest / TuringEyeTest
View on GitHub
Pixels, Patterns, but no Poetry: To See the World like Humans
☆18Aug 11, 2025Updated 11 months ago
pangjh3 / AnLLM
View on GitHub
☆20Jun 17, 2024Updated 2 years ago
rtenlab / gcaps-super-repo
View on GitHub
GCAPS: GPU Context-Aware Preemptive Scheduling Approach
☆16Mar 22, 2026Updated 4 months ago
supersymmetry-technologies / BigBang-Proton
View on GitHub
BigBang-Proton is a LLM pretrained on cross-scale, cross-structure, cross-discipline real-world scientific tasks to construct a scienti…
☆21Nov 8, 2025Updated 8 months ago
GATECH-EIC / Linearized-LLM
View on GitHub
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Jun 12, 2024Updated 2 years ago
UNITES-Lab / Occult
View on GitHub
[ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…
☆13Apr 17, 2025Updated last year
LLMkvsys / rethink-kv-compression
View on GitHub
☆24Mar 7, 2025Updated last year
yangyifei729 / KVSharer
View on GitHub
Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''
☆31Oct 24, 2024Updated last year
HiEST / gpu-topo-aware
View on GitHub
GPU topology-aware scheduler
☆13Jul 7, 2017Updated 9 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
SWE-bench / swe-bench.github.io
View on GitHub
Landing page + leaderboard for SWE-Bench benchmark
☆15Mar 29, 2026Updated 4 months ago
shaadclt / Doctor-Assist-crewAI
View on GitHub
This project leverages advanced AI agents from crewAI to assist doctors in diagnosing medical conditions and recommending treatment plans…
☆15Nov 16, 2024Updated last year
NJUNLP / PATS
View on GitHub
☆47May 27, 2025Updated last year
GTyingzi / Compare_Adversial
View on GitHub
☆11Oct 29, 2022Updated 3 years ago
1KE-JI / UPFT
View on GitHub
Official resources of "The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reaso…
☆20Jun 13, 2025Updated last year
Infini-AI-Lab / TriForce
View on GitHub
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
☆281Aug 31, 2024Updated last year
jadohu / LANTERN
View on GitHub
Official Implementation of LANTERN (ICLR'25) and LANTERN++(ICLRW-SCOPE'25)
☆21Mar 5, 2025Updated last year
JingyangYi / ShorterBetter
View on GitHub
☆18Jul 31, 2025Updated 11 months ago
ziplab / Pyramid-Sparse-Attention
View on GitHub
Official PyTorch implementation of [PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation](https://arxiv.org/abs…
☆25Jan 25, 2026Updated 6 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
c85 / fhir-streamlit-template
View on GitHub
Streamlit template for building SMART on FHIR apps in the Cerner ecosystem.
☆11Sep 22, 2023Updated 2 years ago
InsightSoftwareConsortium / ITKGrowCut
View on GitHub
ITKGrowCut is a remote module for ITK. It segments a 3D image from user-provided foreground and background seeds.
☆16May 27, 2026Updated 2 months ago
feifeibear / LLMSpeculativeSampling
View on GitHub
Fast inference from large lauguage models via speculative decoding
☆921Aug 22, 2024Updated last year
rcorredorj / MPRSimpleViewer
View on GitHub
Simple MPR medical imaging viewer using VTK 6.0.1 and Qt 5.2.1
☆10Aug 15, 2014Updated 11 years ago
jaymody / speculative-sampling
View on GitHub
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆99Aug 20, 2023Updated 2 years ago
ROCm / gfx950-gluon-tutorials
View on GitHub
A practical guide to high-performance gluon kernel development on AMD GFX9 GPUs.
☆41Updated this week
SafeAILab / EAGLE
View on GitHub
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
☆2,481Feb 20, 2026Updated 5 months ago
Small-Model-Gap / Small-Model-Learnability-Gap
View on GitHub
☆23Oct 10, 2025Updated 9 months ago
aeroplanepaper / GRPO-LEAD
View on GitHub
☆40Nov 18, 2025Updated 8 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
AMD-AGI / TraceLens
View on GitHub
Automating analysis from trace files
☆85Updated this week
NickL77 / BaldEagle
View on GitHub
3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding
☆85Jul 3, 2025Updated last year
ROCm / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆43Updated this week
FYYFU / HeadKV
View on GitHub
[ICLR2025] Code and data for paper: Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasonin…
☆45Mar 10, 2025Updated last year
JiananHe / Dicom3DViewer
View on GitHub
本科毕设：基于VTK的三维可视化平台
☆12Jun 13, 2019Updated 7 years ago
NetX-lab / Echo
View on GitHub
Simulating Distributed Training at Scale
☆14Sep 15, 2025Updated 10 months ago
Zehong-Wang / TANS
View on GitHub
Can LLMs Convert Graphs to Text-Attributed Graphs? NAACL 25
☆17Mar 7, 2025Updated last year