amirzandieh / QJLView external linksLinks
QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead
☆31Jan 27, 2025Updated last year
Alternatives and similar repositories for QJL
Users that are interested in QJL are comparing it to the libraries listed below
Sorting:
- Residual vector quantization for KV cache compression in large language model☆11Oct 22, 2024Updated last year
- 4-bit Shampoo for Memory-Efficient Network Training (NeurIPS 2024)☆13Feb 13, 2025Updated last year
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆82Dec 7, 2025Updated 2 months ago
- Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent☆17Sep 8, 2022Updated 3 years ago
- Single-thread, end-to-end C++ implementation of the Bitnet (1.58-bit weight) model☆14Nov 17, 2024Updated last year
- ☆15Jun 4, 2024Updated last year
- Johnson-Lindenstrauss transform (JLT), random projections (RP), fast Johnson-Lindenstrauss transform (FJLT), and randomized Hadamard tran…☆22Jul 11, 2023Updated 2 years ago
- ☆52Nov 5, 2024Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆52Aug 6, 2025Updated 6 months ago
- ☆22Mar 7, 2025Updated 11 months ago
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆123Jul 4, 2025Updated 7 months ago
- ☆20Sep 28, 2024Updated last year
- [SIGMOD 2025] Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Ne…☆61Jun 4, 2025Updated 8 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆93Sep 4, 2024Updated last year
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆25Oct 5, 2024Updated last year
- [NAACL 2022] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers☆21May 16, 2023Updated 2 years ago
- ☆25Oct 31, 2024Updated last year
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆388Apr 13, 2025Updated 10 months ago
- A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.☆23Oct 11, 2024Updated last year
- KV cache compression for high-throughput LLM inference☆154Feb 5, 2025Updated last year
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆56Jul 23, 2024Updated last year
- This is the implementation of the Hierarquical Clustering-based Nearest Neighbor Graphs☆22Feb 6, 2020Updated 6 years ago
- ☆21Apr 17, 2025Updated 10 months ago
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆58Mar 26, 2024Updated last year
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆32Nov 29, 2024Updated last year
- Implementation of NIPS2023: Unleashing the Full Potential of Product Quantization for Large-Scale Image Retrieva☆11Nov 12, 2024Updated last year
- PQ Fast Scan☆70May 31, 2019Updated 6 years ago
- ☆40Mar 28, 2024Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆142Dec 4, 2024Updated last year
- ☆85Oct 17, 2025Updated 4 months ago
- [ECCV 2024] AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer☆41Dec 9, 2024Updated last year
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆36Aug 29, 2025Updated 5 months ago
- ☆35Dec 12, 2023Updated 2 years ago
- Kinematic and dynamic models of continuum and articulated soft robots.☆15Nov 22, 2025Updated 2 months ago
- Source code for the paper "Memory-Efficient Fine-Tuning via Low-Rank Activation Compression"☆13Aug 1, 2025Updated 6 months ago
- 国产加速卡-海光DCU实战(大模型训练、微调、推理 等)☆70Aug 10, 2025Updated 6 months ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆483Nov 26, 2024Updated last year
- A quantization algorithm for LLM☆148Jun 21, 2024Updated last year