QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead
☆97Jan 27, 2025Updated last year
Alternatives and similar repositories for QJL
Users that are interested in QJL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers☆21May 16, 2023Updated 3 years ago
- ☆23Mar 7, 2025Updated last year
- 4-bit Shampoo for Memory-Efficient Network Training (NeurIPS 2024)☆13Feb 13, 2025Updated last year
- Single-thread, end-to-end C++ implementation of the Bitnet (1.58-bit weight) model☆14Nov 17, 2024Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆90Dec 7, 2025Updated 5 months ago
- This is the implementation of the Hierarquical Clustering-based Nearest Neighbor Graphs☆22Feb 6, 2020Updated 6 years ago
- Quick ADC☆27May 31, 2019Updated 6 years ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆391Apr 13, 2025Updated last year
- ☆14Jun 4, 2024Updated last year
- A simple script to plot the Roofline model for given HW platforms and applications☆10Mar 17, 2026Updated 2 months ago
- ☆25Oct 31, 2024Updated last year
- Residual vector quantization for KV cache compression in large language model☆12Oct 22, 2024Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆384Jul 10, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A framework for steering MoE models by detecting and controlling behavior-linked experts.