Yxxxb / VoCo-LLaMALinks

[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

☆195

Alternatives and similar repositories for VoCo-LLaMA

Users that are interested in VoCo-LLaMA are comparing it to the libraries listed below

Sorting:

42Shawn / LLaVA-PruMerge
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆156Updated 2 months ago
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆134Updated 8 months ago
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆170Updated last month
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆199Updated 4 months ago
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
dongyh20 / Insight-V
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆229Updated 3 weeks ago
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆200Updated last year
ncTimTang / AKS
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆139Updated 3 months ago
LALBJ / PAI
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
☆151Updated last year
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆147Updated last year
imagegridworth / IG-VLM
☆140Updated last year
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆128Updated 4 months ago
Cooperx521 / ScaleCap
Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆58Updated 5 months ago
yu-rp / VisualPerceptionToken
☆130Updated 8 months ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆251Updated 3 weeks ago
XMUDeepLIT / AVG-LLaVA
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆33Updated last year
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆62Updated 9 months ago
ywh187 / FitPrune
☆61Updated 7 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆67Updated 10 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆168Updated last year
RifleZhang / LLaVA-Reasoner-DPO
☆102Updated 10 months ago
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆91Updated this week
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆111Updated last year
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆90Updated last year
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Updated 10 months ago
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆74Updated 4 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆77Updated last year
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆166Updated 6 months ago
Becomebright / ReKV
Official PyTorch Code of ReKV (ICLR'25)
☆72Updated last month
minglllli / CLS-RL
[NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
☆74Updated 2 months ago