BAAI-DCAI / Multimodal-Robustness-Benchmark

☆42

Alternatives and similar repositories for Multimodal-Robustness-Benchmark:

Users that are interested in Multimodal-Robustness-Benchmark are comparing it to the libraries listed below

BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated 11 months ago
AoiDragon / POPE
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆78Updated 10 months ago
joez17 / VideoNIAH
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆37Updated 4 months ago
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆56Updated 4 months ago
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆143Updated last month
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆83Updated 4 months ago
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆123Updated 2 months ago
42Shawn / LLaVA-PruMerge
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆116Updated 9 months ago
DreamMr / HR-Bench
PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…
☆20Updated 2 weeks ago
Cooperx521 / PyramidDrop
The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".
☆53Updated last month
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆61Updated 5 months ago
yfzhang114 / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆92Updated last week
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆54Updated 5 months ago
ywh187 / FitPrune
☆36Updated last month
LALBJ / PAI
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
☆98Updated 3 months ago
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆68Updated 8 months ago
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆70Updated 3 weeks ago
ChocoWu / SeTok
Codes for Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆49Updated 4 months ago
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆43Updated 7 months ago
HKUST-LongGroup / CoMM
Official repository for CoMM Dataset
☆27Updated last month
RifleZhang / LLaVA-Hound-DPO
☆138Updated 3 months ago
KangarooGroup / Kangaroo
official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
☆63Updated 5 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆49Updated 3 weeks ago
Lackel / AGLA
[Arxiv 2024] AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆21Updated 7 months ago
scofield7419 / Video-of-Thought
Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆94Updated 2 months ago
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆85Updated 6 months ago
pengts / VW-LMM
☆24Updated 9 months ago
Visual-AI / PruneVid
The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".
☆29Updated this week
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆165Updated 4 months ago