[NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
☆32Mar 30, 2025Updated 11 months ago
Alternatives and similar repositories for ZipCache
Users that are interested in ZipCache are comparing it to the libraries listed below
Sorting:
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆53Mar 25, 2025Updated 11 months ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆23Mar 29, 2024Updated last year
- CoV: Chain-of-View Prompting for Spatial Reasoning☆51Jan 23, 2026Updated last month
- ☆16Sep 12, 2023Updated 2 years ago
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Oct 5, 2024Updated last year
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆68Jun 4, 2024Updated last year
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 6 months ago
- ☆52May 13, 2024Updated last year
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Dec 27, 2023Updated 2 years ago
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆24Mar 4, 2025Updated 11 months ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆128Nov 26, 2025Updated 3 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆44Apr 18, 2025Updated 10 months ago
- The official implementation of BiViT: Extremely Compressed Binary Vision Transformers☆16Jun 18, 2023Updated 2 years ago
- Algorithms for approximate attention in LLMs☆21Apr 14, 2025Updated 10 months ago
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆17Sep 7, 2024Updated last year
- Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)☆25Jul 4, 2024Updated last year
- Streaming Video Diffusion: Online Video Editing with Diffusion Models☆18Jun 3, 2024Updated last year
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs☆82Jan 17, 2026Updated last month
- [ICLR 2025] Official PyTorch implmentation of paper "T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stit…☆104Feb 26, 2024Updated 2 years ago
- Implementation for HiPrune, a training-free visual token pruning method for VLM acceleration.☆45Oct 29, 2025Updated 4 months ago
- Tiny optimized Stable-diffusion that can run on GPUs with just 1GB of VRAM. (Beta)☆182Jul 20, 2023Updated 2 years ago
- FID computation in Jax/Flax.☆29Jul 17, 2024Updated last year
- C^3-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking☆37Jun 30, 2025Updated 8 months ago
- ☆13Oct 5, 2025Updated 4 months ago
- ☆85Jan 23, 2025Updated last year
- ACL 2023☆39Jun 6, 2023Updated 2 years ago
- [ICCV 2021] Official implementation of "Scalable Vision Transformers with Hierarchical Pooling"