foundation-multimodal-models / ConBenchLinks

[NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".

☆38

Alternatives and similar repositories for ConBench

Users that are interested in ConBench are comparing it to the libraries listed below

Sorting:

Visual-AI / PruneVid
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
☆55Updated 5 months ago
HKUST-LongGroup / CoMM
Official repository for CoMM Dataset
☆48Updated 9 months ago
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆122Updated 6 months ago
JierunChen / Ref-L4
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
☆49Updated 10 months ago
42Shawn / LLaVA-PruMerge
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆152Updated last month
SliMM-X / CoMP-MM
Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"
☆32Updated 6 months ago
Yanqing0327 / MLLMs-Augmented
The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》
☆31Updated last year
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆57Updated last year
KD-TAO / DyCoke
[CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
☆81Updated last month
ggjy / DeLVM
☆119Updated last year
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆132Updated 7 months ago
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆48Updated last year
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆135Updated last week
LaVi-Lab / AIM
[ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"
☆42Updated 3 weeks ago
FeipengMa6 / VLoRA
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆52Updated 6 months ago
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆87Updated 3 months ago
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆112Updated last year
NUS-HPC-AI-Lab / Dynamic-Tuning
The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"
☆50Updated 10 months ago
KangarooGroup / Kangaroo
official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
☆67Updated last year
Yaxin9Luo / Gamma-MOD
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆39Updated 8 months ago
appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆55Updated 4 months ago
alibaba / conv-llava
☆119Updated last year
takomc / amp
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆20Updated last year
ligeng0197 / Awesome-Thinking-With-Images
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…
☆95Updated 2 months ago
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆64Updated last year
hasanar1f / HiRED
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆41Updated 6 months ago
si0wang / VisVM
☆45Updated 10 months ago
MCG-NJU / p-MoD
[ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
☆43Updated 4 months ago
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆70Updated 8 months ago
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆49Updated 7 months ago