ruipeterpan / marconiLinks
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
☆47Updated 9 months ago
Alternatives and similar repositories for marconi
Users that are interested in marconi are comparing it to the libraries listed below
Sorting:
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆70Updated 3 weeks ago
- ☆80Updated 2 months ago
- 16-fold memory access reduction with nearly no loss☆109Updated 9 months ago
- ☆58Updated last year
- Distributed MoE in a Single Kernel [NeurIPS '25]☆157Updated this week
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆95Updated last week
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆72Updated last week
- ☆83Updated 11 months ago
- ☆65Updated 8 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer