[NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
☆31Mar 30, 2025Updated last year
Alternatives and similar repositories for ZipCache
Users that are interested in ZipCache are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆51Mar 25, 2025Updated last year
- [ICCV 2025] The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"☆62Apr 5, 2025Updated last year
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆25Mar 29, 2024Updated 2 years ago
- [ACL 2026 Findings] CoV: Chain-of-View Prompting for Spatial Reasoning☆60Apr 7, 2026Updated last month
- ☆16Sep 12, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Oct 5, 2024Updated last year
- ☆12Sep 7, 2024Updated last year
- ☆53May 13, 2024Updated 2 years ago
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆20Jun 19, 2025Updated 11 months ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆71Jun 4, 2024Updated last year
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆103Mar 12, 2024Updated 2 years ago
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Dec 27, 2023Updated 2 years ago
- The official implementation of BiViT: Extremely Compressed Binary Vision Transformers☆16Jun 18, 2023Updated 2 years ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆132Nov 26, 2025Updated 5 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs☆85Jan 17, 2026Updated 4 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆40Mar 11, 2024Updated 2 years ago
- Benchmarking Attention Mechanism in Vision Transformers.☆20Oct 10, 2022Updated 3 years ago
- [ICLR 2026] This is the official PyTorch implementation of "BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Gen…☆43Oct 9, 2025Updated 7 months ago
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 9 months ago
- Implement FlashAttention v2 with minimal code to learn.☆16Jun 12, 2024Updated last year
- [ACL-26 Findings] Implementation for HiPrune, a training-free visual token pruning method for VLM acceleration.☆54Apr 29, 2026Updated 3 weeks ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆183Jul 12, 2024Updated last year
- [ICCV2025] The official code of "DreamRelation: Relation-Centric Video Customization"☆26Feb 4, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆23Mar 4, 2025Updated last year
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆44Apr 18, 2025Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆339Jul 2, 2024Updated last year
- Algorithms for approximate attention in LLMs☆22Apr 14, 2025Updated last year
- [ICLR 2025] Official PyTorch implmentation of paper "T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stit…☆107Feb 26, 2024Updated 2 years ago
- Tiny optimized Stable-diffusion that can run on GPUs with just 1GB of VRAM. (Beta)☆184Jul 20, 2023Updated 2 years ago
- An curated list for feed-forward 3D scene modeling, including research directions, datasets, and applications.☆230Apr 22, 2026Updated 3 weeks ago
- [IJCV 2025] Paragraph-to-Image Generation with Information-Enriched Diffusion Model☆107Mar 24, 2025Updated last year
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆17Sep 7, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- An example project showing how to build a pip-installable Python package that invokes custom CUDA/C++ code☆14Jul 12, 2017Updated 8 years ago
- Streaming Video Diffusion: Online Video Editing with Diffusion Models☆18Jun 3, 2024Updated last year
- This repository implements the paper "Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations"☆20Aug 30, 2021Updated 4 years ago
- ☆47Nov 25, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated 2 years ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆399Nov 20, 2025Updated 6 months ago
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Feb 26, 2025Updated last year