ArmenJeddi / saintLinks
a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarity
☆37Updated 4 months ago
Alternatives and similar repositories for saint
Users that are interested in saint are comparing it to the libraries listed below
Sorting:
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆92Updated 3 months ago
- ☆27Updated 7 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆163Updated 4 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆41Updated 5 months ago
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs☆37Updated 3 months ago
- The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)☆66Updated 3 months ago
- Official repository of InLine attention (NeurIPS 2024)☆56Updated 9 months ago
- [CVPR 2025] PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models☆44Updated last week
- CLIP-MoE: Mixture of Experts for CLIP☆47Updated last year
- [2025] Efficient Vision Language Models: A Survey☆32Updated 2 months ago
- ☆58Updated 5 months ago
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆23Updated 3 months ago
- The implementation for FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts (ICCV25)☆10Updated 3 months ago
- [ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains More☆56Updated 8 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆35Updated 9 months ago
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆83Updated 9 months ago
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.☆34Updated 9 months ago
- Code release for VTW (AAAI 2025 Oral)☆50Updated 2 months ago
- Adapting LLaMA Decoder to Vision Transformer☆30Updated last year
- [NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks☆130Updated 10 months ago
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆35Updated last month
- [CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models☆47Updated 4 months ago
- [EMNLP 2025 main] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆74Updated last month
- 【NeurIPS 2024】Official implementation of "Visual Fourier Prompt Tuning"☆33Updated 8 months ago
- Data distillation benchmark☆68Updated 3 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆101Updated 3 months ago
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆29Updated 7 months ago
- [ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference☆45Updated last year
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆149Updated 2 weeks ago
- [NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆53Updated 2 weeks ago