xxyux / SpInferLinks
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆59Updated 8 months ago
Alternatives and similar repositories for SpInfer
Users that are interested in SpInfer are comparing it to the libraries listed below
Sorting:
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆66Updated 3 weeks ago
- ☆83Updated 10 months ago
- ☆162Updated last year
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆94Updated 6 months ago
- ☆65Updated 7 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆36Updated last week
- Tile-based language built for AI computation across all scales☆85Updated this week
- Implement Flash Attention using Cute.☆97Updated 11 months ago
- ☆38Updated last month
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆146Updated 2 months ago
- LLM Inference with Microscaling Format☆33Updated last year
- ☆19Updated last year
- ☆58Updated last year
- DeeperGEMM: crazy optimized version☆73Updated 7 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆49Updated 11 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆227Updated 2 years ago
- ☆124Updated 3 months ago
- ☆58Updated last year
- ☆112Updated 6 months ago
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆25Updated last year
- Quantized Attention on GPU☆44Updated last year
- A lightweight design for computation-communication overlap.☆194Updated 2 months ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆55Updated 2 years ago
- NVIDIA cuTile learn☆69Updated this week
- ☆60Updated last year
- ☆102Updated last year
- ☆81Updated last year
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆82Updated this week
- Debug print operator for cudagraph debugging☆14Updated last year
- Building the Virtuous Cycle for AI-driven LLM Systems☆95Updated last week