IST-DASLab / Sparse-MarlinLinks
Boosting 4-bit inference kernels with 2:4 Sparsity
☆90Updated last year
Alternatives and similar repositories for Sparse-Marlin
Users that are interested in Sparse-Marlin are comparing it to the libraries listed below
Sorting:
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆136Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆175Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.