16-fold memory access reduction with nearly no loss
☆108Mar 26, 2025Updated last year
Alternatives and similar repositories for DoubleSparse
Users that are interested in DoubleSparse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆145Dec 4, 2024Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆380Jul 10, 2025Updated 9 months ago
- An experimentation platform for LLM inference optimisation