xuyang-liu16 / Awesome-Token-level-Model-Compression
π Collection of token reduction for model compression resources.
β51Updated last week
Alternatives and similar repositories for Awesome-Token-level-Model-Compression:
Users that are interested in Awesome-Token-level-Model-Compression are comparing it to the libraries listed below
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".β93Updated last month
- The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"β39Updated last week
- Code release for VTW (AAAI 2025) Oralβ34Updated 3 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.β70Updated 4 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Modelβ27Updated 3 months ago
- A collection of recent token reduction (token pruning, merging, clustering, etc.) techniques for ML/AIβ39Updated this week
- [Arxiv 2025] Efficient Reasoning Models: A Surveyβ107Updated this week
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"β36Updated 3 weeks ago
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.β71Updated 3 months ago
- [EMNLP 2024 Findingsπ₯] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inβ¦β92Updated 5 months ago
- β42Updated 3 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.β50Updated 3 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Modelsβ126Updated 11 months ago
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identificationβ21Updated 3 weeks ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Contβ¦β31Updated 4 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ89Updated last month
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"β46Updated 3 weeks ago
- DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ40Updated 2 weeks ago
- Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Modelsβ18Updated last week
- π Collection of awesome generation acceleration resources.β215Updated this week
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β37Updated last month
- [NeurIPS 2024 Oral π₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.β157Updated 6 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Modelsβ87Updated 2 months ago
- βοΈ Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraintsβ65Updated 3 weeks ago
- π This is a repository for organizing papers, codes, and other resources related to unified multimodal models.β173Updated 2 weeks ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retentiβ¦β64Updated last year
- β99Updated 9 months ago
- PyTorch code for our paper "ARB-LLM: Alternating Refined Binarizations for Large Language Models"β24Updated last month
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005β27Updated 5 months ago
- β80Updated last month