ArmenJeddi / saint
a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity
☆21Updated last month
Alternatives and similar repositories for saint
Users that are interested in saint are comparing it to the libraries listed below
Sorting:
- CLIP-MoE: Mixture of Experts for CLIP☆34Updated 7 months ago
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)☆48Updated 2 months ago
- ☆36Updated 9 months ago
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"☆46Updated 4 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆33Updated last month
- Adapting LLaMA Decoder to Vision Transformer☆28Updated 11 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 7 months ago
- Official repository of InLine attention (NeurIPS 2024)☆46Updated 4 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆28Updated 4 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆18Updated 3 months ago
- Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT …☆35Updated last year
- toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts☆13Updated 8 months ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆39Updated 11 months ago
- (NeurIPS 2024) BiDM: Pushing the Limit of Quantization for Diffusion Models☆20Updated 5 months ago
- Code release for VTW (AAAI 2025) Oral☆39Updated 4 months ago
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆20Updated 2 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆41Updated last month
- [CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference☆30Updated last year
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆18Updated last month
- ☆44Updated last week
- ☆27Updated 7 months ago
- ☆28Updated 11 months ago
- GIFT: Generative Interpretable Fine-Tuning☆20Updated 7 months ago
- Official implementation for "Knowledge Distillation with Refined Logits".☆13Updated 8 months ago
- PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation☆14Updated 5 months ago
- [CVPR 2024] The official pytorch implementation of "A General and Efficient Training for Transformer via Token Expansion".☆44Updated last year
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆36Updated 5 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 7 months ago
- [ECCV 2024 Workshop Best Paper Award] Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion☆33Updated 7 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆75Updated 5 months ago