thu-nics / FrameFusion
The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
β37Updated last month
Alternatives and similar repositories for FrameFusion:
Users that are interested in FrameFusion are comparing it to the libraries listed below
- π Collection of token reduction for model compression resources.β47Updated last week
- Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".β83Updated 3 weeks ago
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.β68Updated 3 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.β44Updated 2 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.β64Updated 3 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Modelsβ121Updated 10 months ago
- The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"β35Updated 2 weeks ago
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"β46Updated last week
- DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ38Updated 2 weeks ago
- [EMNLP 2024 Findingsπ₯] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inβ¦β92Updated 4 months ago
- Code release for VTW (AAAI 2025) Oralβ33Updated 2 months ago
- π Collection of awesome generation acceleration resources.β182Updated this week
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ84Updated 3 weeks ago
- β155Updated 2 months ago
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"β26Updated this week
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visioβ¦β26Updated 2 months ago
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identificationβ19Updated this week
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ71Updated last week
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Cachingβ98Updated 8 months ago
- βοΈ Accelerating Vision Diffusion Transformers with Skip Branches.β64Updated this week
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Modelsβ66Updated this week
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Modelsβ82Updated last month
- A collection of recent token reduction (token pruning, merging, clustering, etc.) techniques for ML/AIβ27Updated last week
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillationβ120Updated 2 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ255Updated 2 months ago
- [ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains Moreβ36Updated last month
- [NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.β150Updated 5 months ago
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"β43Updated 3 months ago
- [CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Moβ¦β61Updated 8 months ago
- β41Updated 2 months ago