JIA-Lab-research/VisionZip

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JIA-Lab-research/VisionZip)

JIA-Lab-research / VisionZip

Official repository for VisionZip (CVPR 2025)

☆443

Alternatives and similar repositories for VisionZip

Users that are interested in VisionZip are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Yangsenqiao / ULDA
View on GitHub
Unified Language-driven Zero-shot Domain Adaptation (CVPR 2024)
☆17Nov 28, 2024Updated last year
Gumpest / SparseVLMs
View on GitHub
[ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".
☆266Dec 22, 2025Updated 6 months ago
JIA-Lab-research / VisionThink
View on GitHub
[NeurIPS 2025] Efficient Reasoning Vision Language Models
☆459Sep 18, 2025Updated 10 months ago
daixiangzi / Awesome-Token-Compress
View on GitHub
A paper list of some recent works about Token Compress for Vit and VLM
☆939Updated this week
pkunlp-icler / FastV
View on GitHub
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆592Jan 4, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Theia-4869 / FasterVLM
View on GitHub
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆114Jun 29, 2025Updated last year
KD-TAO / DyCoke
View on GitHub
[CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
☆113Nov 22, 2025Updated 7 months ago
vbdi / divprune
View on GitHub
[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
☆86Apr 16, 2026Updated 3 months ago
yu-lin-li / ReBalance
View on GitHub
[ICLR 2026] Efficient Reasoning with Balanced Thinking
☆131May 30, 2026Updated last month
Cooperx521 / PyramidDrop
View on GitHub
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆151Mar 6, 2025Updated last year
ywh187 / FitPrune
View on GitHub
☆68Jan 23, 2026Updated 5 months ago
Theia-4869 / VisPruner
View on GitHub
[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
☆84Jul 1, 2025Updated last year
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,709Jun 15, 2026Updated last month
thu-nics / FrameFusion
View on GitHub
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
☆76Jan 13, 2026Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Osilly / dynamic_llava
View on GitHub
[ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…
☆72Sep 18, 2025Updated 10 months ago
Theia-4869 / CDPruner
View on GitHub
[NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
☆105Sep 20, 2025Updated 10 months ago
Yangsenqiao / Awesome-Continual-Test-Time-Adaptation
View on GitHub
Collection of awesome Continual Test-Time Adaptation methods
☆24Jun 4, 2024Updated 2 years ago
hulianyuyy / iLLaVA
View on GitHub
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)
☆23Jun 24, 2026Updated 3 weeks ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆422Jan 29, 2026Updated 5 months ago
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆4,320Updated this week
cokeshao / HoliTom
View on GitHub
[NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Models
☆84Oct 10, 2025Updated 9 months ago
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated last year
xuyang-liu16 / Awesome-Token-level-Model-Compression
View on GitHub
📚 Collection of token-level model compression resources.
☆200Sep 3, 2025Updated 10 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
liuting20 / MustDrop
View on GitHub
Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
☆36Jan 8, 2025Updated last year
JIA-Lab-research / LSDBench
View on GitHub
A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…
☆28Aug 7, 2025Updated 11 months ago
codefanw / FlashSloth
View on GitHub
[CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
☆64Oct 10, 2025Updated 9 months ago
JIA-Lab-research / Step-DPO
View on GitHub
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
☆398Jan 19, 2025Updated last year
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆726Sep 24, 2025Updated 9 months ago
JIA-Lab-research / LISA
View on GitHub
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,662Feb 16, 2025Updated last year
mu-cai / matryoshka-mm
View on GitHub
Matryoshka Multimodal Models
☆123Jan 22, 2025Updated last year
Visual-AI / PruneVid
View on GitHub
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆71May 15, 2025Updated last year
JIA-Lab-research / LLaMA-VID
View on GitHub
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
☆861Jul 29, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,291Updated this week
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆879Dec 14, 2025Updated 7 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,492Mar 9, 2026Updated 4 months ago
JIA-Lab-research / Lyra
View on GitHub
[ICCV 2025] Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"
☆307Jan 9, 2025Updated last year
ictnlp / LLaVA-Mini
View on GitHub
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in …
☆574Jun 29, 2025Updated last year
EffiVLM-Bench / EffiVLM-Bench
View on GitHub
☆35Jun 3, 2025Updated last year
42Shawn / LLaVA-PruMerge
View on GitHub
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆173Mar 8, 2026Updated 4 months ago