42Shawn/LLaVA-PruMerge

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/42Shawn/LLaVA-PruMerge)

42Shawn / LLaVA-PruMerge

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

☆173

Alternatives and similar repositories for LLaVA-PruMerge

Users that are interested in LLaVA-PruMerge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pkunlp-icler / FastV
View on GitHub
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆591Jan 4, 2025Updated last year
ichbill / LTDD
View on GitHub
Official Implementation of paper "Distilling Long-tailed Datasets" [CVPR 2025]
☆24Aug 13, 2025Updated 11 months ago
ywh187 / FitPrune
View on GitHub
☆68Jan 23, 2026Updated 6 months ago
Cooperx521 / PyramidDrop
View on GitHub
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆151Mar 6, 2025Updated last year
double125 / MADTP
View on GitHub
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
☆50Sep 6, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated last year
gordonhu608 / MQT-LLaVA
View on GitHub
[NeurIPS 2024] Matryoshka Query Transformer for Large Vision-Language Models
☆126Jul 1, 2024Updated 2 years ago
Gumpest / SparseVLMs
View on GitHub
[ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".
☆267Dec 22, 2025Updated 7 months ago
SUSTechBruce / LOOK-M
View on GitHub
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆103Nov 9, 2024Updated last year
hasanar1f / HiRED
View on GitHub
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆58Apr 18, 2025Updated last year
daixiangzi / Awesome-Token-Compress
View on GitHub
A paper list of some recent works about Token Compress for Vit and VLM
☆944Updated this week
hatchetProject / QuEST
View on GitHub
[ICCV 2025] QuEST: Efficient Finetuning for Low-bit Diffusion Models
☆60Jun 26, 2025Updated last year
CircleRadon / TokenPacker
View on GitHub
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
☆279May 26, 2025Updated last year
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆447Dec 22, 2024Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
locuslab / llava-token-compression
View on GitHub
☆47Nov 8, 2024Updated last year
Osilly / dynamic_llava
View on GitHub
[ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…
☆72Sep 18, 2025Updated 10 months ago
JIA-Lab-research / VisionZip
View on GitHub
Official repository for VisionZip (CVPR 2025)
☆443Jul 21, 2025Updated last year
MikeWangWZHL / dymu
View on GitHub
☆29May 13, 2025Updated last year
ZichenWen1 / DART
View on GitHub
[EMNLP 2025 main 🔥] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"
☆121Oct 12, 2025Updated 9 months ago
KD-TAO / DyCoke
View on GitHub
[CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
☆113Nov 22, 2025Updated 8 months ago
Yaxin9Luo / Gamma-MOD
View on GitHub
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆45Oct 28, 2025Updated 8 months ago
whwu95 / FreeVA
View on GitHub
FreeVA: Offline MLLM as Training-Free Video Assistant
☆69Jun 9, 2024Updated 2 years ago
UMass-Embodied-AGI / FlexAttention
View on GitHub
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆49Jan 8, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
adreamwu / PTQ4DiT
View on GitHub
PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005
☆49Nov 8, 2024Updated last year
liuting20 / MustDrop
View on GitHub
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
☆36Jan 8, 2025Updated last year
sdc17 / CrossGET
View on GitHub
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
☆34Dec 30, 2024Updated last year
wangqinsi1 / CoreInfer
View on GitHub
This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…
☆18Oct 25, 2024Updated last year
yfzhang114 / SliME
View on GitHub
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆163Dec 26, 2024Updated last year
luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆249Aug 14, 2024Updated last year
OpenGVLab / DiffRate
View on GitHub
[ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…
☆103Jul 14, 2023Updated 3 years ago
wangqinsi1 / 2025-ICML-CoreMatching
View on GitHub
[ICML 2025] CoreMatching: Co-adaptive Sparse Inference Framework for Comprehensive Acceleration of Vision Language Model
☆16May 27, 2025Updated last year
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
mu-cai / matryoshka-mm
View on GitHub
Matryoshka Multimodal Models
☆123Jan 22, 2025Updated last year
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆4,336Updated this week
Theia-4869 / FasterVLM
View on GitHub
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆114Jun 29, 2025Updated last year
MCG-NJU / Video-DC
View on GitHub
☆12Jul 30, 2025Updated 11 months ago
NUS-HPC-AI-Lab / DD-Ranking
View on GitHub
Data distillation benchmark
☆73Jun 13, 2025Updated last year
TinyLLaVA / TinyLLaVA_Factory
View on GitHub
A Framework of Small-scale Large Multimodal Models
☆995Updated this week
thu-nics / DiTFastAttn
View on GitHub
☆192Jan 14, 2025Updated last year