QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead
☆100Jan 27, 2025Updated last year
Alternatives and similar repositories for QJL
Users that are interested in QJL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers☆21May 16, 2023Updated 3 years ago
- ☆24Mar 7, 2025Updated last year
- 4-bit Shampoo for Memory-Efficient Network Training (NeurIPS 2024)☆13Feb 13, 2025Updated last year
- Single-thread, end-to-end C++ implementation of the Bitnet (1.58-bit weight) model☆15Nov 17, 2024Updated last year
- The repo has been moved to https://github.com/VectorDB-NTU/RaBitQ-Library. [SIGMOD 2025] Practical and Asymptotically Optimal Quantizatio…☆72Mar 30, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆91Dec 7, 2025Updated 6 months ago
- This is the implementation of the Hierarquical Clustering-based Nearest Neighbor Graphs☆22Feb 6, 2020Updated 6 years ago
- Quick ADC☆27May 31, 2019Updated 7 years ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆389Apr 13, 2025Updated last year
- ☆14Jun 4, 2024Updated 2 years ago
- A simple script to plot the Roofline model for given HW platforms and applications☆10Mar 17, 2026Updated 3 months ago
- Residual vector quantization for KV cache compression in large language model☆12Oct 22, 2024Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆397Jul 10, 2025Updated 11 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent☆16Sep 8, 2022Updated 3 years ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆151Dec 4, 2024Updated last year
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆58May 3, 2026Updated last month
- Official PyTorch implementation of SynergyNeRF: "Synergistic Integration of Coordinate Network and Tensorial Feature for Improving NeRFs …☆12Sep 23, 2024Updated last year
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆129Jul 4, 2025Updated 11 months ago
- ☆54Nov 5, 2024Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆95Sep 4, 2024Updated last year
- Compression primitives for uplink compression in Federated Learning that are compatible with Secure Aggregation.☆11Jul 27, 2022Updated 3 years ago
- PQ Fast Scan☆70May 31, 2019Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- KV cache compression for high-throughput LLM inference☆158Feb 5, 2025Updated last year
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆57Mar 26, 2024Updated 2 years ago
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆254Dec 16, 2024Updated last year
- The repo has been moved to https://github.com/VectorDB-NTU/RaBitQ-Library. [SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with…☆251Apr 22, 2026Updated 2 months ago
- Towards Memorization-Free Diffusion Models (CVPR2024) Codebase☆11Jun 2, 2024Updated 2 years ago
- ☆23Aug 20, 2025Updated 10 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆517Nov 26, 2024Updated last year
- Implement some method of LLM KV Cache Sparsity☆41Jun 6, 2024Updated 2 years ago
- ☆42Mar 28, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [OSDI 2025] DecDEC: A Systems Approach to Advancing Low‑Bit LLM Quantization☆24Jan 29, 2026Updated 5 months ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆34Nov 29, 2024Updated last year
- Official implementation of StochSync: a zero-shot approach for image generation in arbitrary spaces via stochastic diffusion synchronizat…☆21Jun 24, 2025Updated last year
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆39Aug 29, 2025Updated 10 months ago
- GitHub action that validates the syntax of selected RDF files in the repository☆12Feb 12, 2024Updated 2 years ago
- A minimal implementation of spotify/annoy in pure rust☆11Mar 2, 2023Updated 3 years ago
- ☆19Jan 26, 2025Updated last year