QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead
☆38Jan 27, 2025Updated last year
Alternatives and similar repositories for QJL
Users that are interested in QJL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23Mar 7, 2025Updated last year
- [SIGMOD 2025] Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Ne…☆63Jun 4, 2025Updated 9 months ago
- 4-bit Shampoo for Memory-Efficient Network Training (NeurIPS 2024)☆13Feb 13, 2025Updated last year
- Single-thread, end-to-end C++ implementation of the Bitnet (1.58-bit weight) model☆14Nov 17, 2024Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆83Dec 7, 2025Updated 3 months ago
- This is the implementation of the Hierarquical Clustering-based Nearest Neighbor Graphs☆22Feb 6, 2020Updated 6 years ago
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆30Sep 12, 2025Updated 6 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆389Apr 13, 2025Updated 11 months ago
- A simple script to plot the Roofline model for given HW platforms and applications☆10Mar 17, 2026Updated last week
- ☆14Jun 4, 2024Updated last year
- ☆25Oct 31, 2024Updated last year
- Residual vector quantization for KV cache compression in large language model☆12Oct 22, 2024Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆381Jul 10, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent☆16Sep 8, 2022Updated 3 years ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆58Jul 23, 2024Updated last year
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆123Jul 4, 2025Updated 8 months ago
- ☆52Nov 5, 2024Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆94Sep 4, 2024Updated last year
- KV cache compression for high-throughput LLM inference☆154Feb 5, 2025Updated last year
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆363Nov 20, 2025Updated 4 months ago
- ☆18Mar 19, 2026Updated last week
- 用AI从0开始制作“研究生模拟器”小游戏☆42Feb 27, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆57Mar 26, 2024Updated 2 years ago
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆251Dec 16, 2024Updated last year
- ☆20Aug 20, 2025Updated 7 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆145Dec 4, 2024Updated last year
- Implement some method of LLM KV Cache Sparsity☆41Jun 6, 2024Updated last year
- ☆42Mar 28, 2024Updated 2 years ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆34Nov 29, 2024Updated last year
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆37Aug 29, 2025Updated 7 months ago
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops☆30Mar 16, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A minimal implementation of spotify/annoy in pure rust☆11Mar 2, 2023Updated 3 years ago
- ☆19Jan 26, 2025Updated last year
- ☆15Sep 2, 2020Updated 5 years ago
- ECE Department of Tehran University - Computer Network Lab. Under the supervision of Dr. Ahmad Khonsari.☆21Mar 13, 2025Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆61Feb 7, 2025Updated last year
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Oct 5, 2024Updated last year
- ☆82Dec 1, 2023Updated 2 years ago