[NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
☆31Mar 30, 2025Updated 11 months ago
Alternatives and similar repositories for ZipCache
Users that are interested in ZipCache are comparing it to the libraries listed below
Sorting:
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆53Mar 25, 2025Updated 11 months ago
- [ICCV 2025] The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"☆59Apr 5, 2025Updated 11 months ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆25Mar 29, 2024Updated last year
- CoV: Chain-of-View Prompting for Spatial Reasoning☆52Jan 23, 2026Updated last month
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Oct 5, 2024Updated last year
- ☆11Sep 7, 2024Updated last year
- ☆52May 13, 2024Updated last year
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆103Mar 12, 2024Updated 2 years ago
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Dec 27, 2023Updated 2 years ago
- The official implementation of BiViT: Extremely Compressed Binary Vision Transformers☆16Jun 18, 2023Updated 2 years ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆128Nov 26, 2025Updated 3 months ago
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs☆82Jan 17, 2026Updated 2 months ago
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆31Jul 4, 2024Updated last year
- This is the official PyTorch implementation of "BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation."☆40Oct 9, 2025Updated 5 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated 2 years ago
- Benchmarking Attention Mechanism in Vision Transformers.☆20Oct 10, 2022Updated 3 years ago
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 7 months ago
- Implement FlashAttention v2 with minimal code to learn.☆15Jun 12, 2024Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆180Jul 12, 2024Updated last year
- [ICCV2025] The official code of "DreamRelation: Relation-Centric Video Customization"☆28Feb 4, 2026Updated last month
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆24Mar 4, 2025Updated last year
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆45Apr 18, 2025Updated 11 months ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆335Jul 2, 2024Updated last year
- [ICLR 2025] Official PyTorch implmentation of paper "T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stit…☆104Feb 26, 2024Updated 2 years ago
- Tiny optimized Stable-diffusion that can run on GPUs with just 1GB of VRAM. (Beta)☆182Jul 20, 2023Updated 2 years ago
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆17Sep 7, 2024Updated last year
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆107Mar 24, 2025Updated 11 months ago
- An example project showing how to build a pip-installable Python package that invokes custom CUDA/C++ code☆14Jul 12, 2017Updated 8 years ago
- Streaming Video Diffusion: Online Video Editing with Diffusion Models☆18Jun 3, 2024Updated last year
- This repository implements the paper "Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations"☆20Aug 30, 2021Updated 4 years ago
- ☆47Nov 25, 2024Updated last year
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆359Nov 20, 2025Updated 4 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated 2 years ago
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Feb 26, 2025Updated last year
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models☆53Aug 9, 2024Updated last year
- The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).☆22Aug 2, 2025Updated 7 months ago
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆104Nov 9, 2024Updated last year
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆489Nov 26, 2024Updated last year
- ☆12Sep 1, 2023Updated 2 years ago