OpenGVLab/MMT-Bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenGVLab/MMT-Bench)

OpenGVLab / MMT-Bench

[ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

☆119

Alternatives and similar repositories for MMT-Bench

Users that are interested in MMT-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenGVLab / MMIU
View on GitHub
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆98Sep 14, 2024Updated last year
KainingYing / CTVIS
View on GitHub
[ICCV 2023] CTVIS: Consistent Training for Online Video Instance Segmentation
☆82Oct 15, 2023Updated 2 years ago
zwq2018 / Multi-modal-Self-instruct
View on GitHub
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆85Jan 27, 2025Updated last year
MME-Benchmarks / MME-RealWorld
View on GitHub
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆160Oct 21, 2025Updated 8 months ago
FudanCVL / SAAS
View on GitHub
[AAAI 2026] Segment Anything Across Shots: A Method and Benchmark
☆29Nov 16, 2025Updated 8 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
OpenGVLab / Multi-Modality-Arena
View on GitHub
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…
☆564Apr 21, 2024Updated 2 years ago
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,285Updated this week
tianyu-z / VCR
View on GitHub
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
☆32Feb 26, 2025Updated last year
tsb0601 / MMVP
View on GitHub
☆364Jan 27, 2024Updated 2 years ago
GeWu-Lab / Ref-AVS
View on GitHub
The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024
☆50Oct 12, 2025Updated 9 months ago
dali-does / clevr-math
View on GitHub
☆13May 9, 2023Updated 3 years ago
open-compass / MMBench
View on GitHub
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
☆306May 22, 2025Updated last year
s-vco / s-vco
View on GitHub
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
☆19Jun 4, 2025Updated last year
OpenGVLab / DiffAgent
View on GitHub
[CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
☆19Apr 16, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Liuziyu77 / MMDU
View on GitHub
Official repository of MMDU dataset
☆108Sep 29, 2024Updated last year
OpenGVLab / LLMPrune-BESA
View on GitHub
BESA is a differentiable weight pruning technique for large language models.
☆17Mar 4, 2024Updated 2 years ago
AILab-CVC / SEED-Bench
View on GitHub
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
☆365Jan 14, 2025Updated last year
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
OpenGVLab / Multitask-Model-Selector
View on GitHub
[NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector
☆37Mar 7, 2024Updated 2 years ago
zeyofu / BLINK_Benchmark
View on GitHub
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆171Sep 27, 2025Updated 9 months ago
WildVision-AI / WildVision-Bench
View on GitHub
☆17Oct 21, 2024Updated last year
google-research-datasets / 2.5vrd
View on GitHub
This dataset contains about 110k images annotated with the depth and occlusion relationships between arbitrary objects. It enables resear…
☆16Apr 28, 2021Updated 5 years ago
FudanCVL / AVI-Bench
View on GitHub
[ICML'26] Toward Human-like Audio-Visual Intelligence of Omni-MLLMs
☆15Jun 20, 2026Updated last month
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
FreedomIntelligence / TRIM
View on GitHub
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆22Jan 11, 2026Updated 6 months ago
xqlin98 / Fair-yet-Equal-CML
View on GitHub
This is the official implementation of the ICML 2023 paper "Fair yet Asymptotically Equal Collaborative Learning"
☆10May 29, 2023Updated 3 years ago
tliby / UniFork
View on GitHub
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
☆48Aug 26, 2025Updated 10 months ago
CMMMU-Benchmark / CMMMU
View on GitHub
☆48Sep 5, 2024Updated last year
adlnlp / pdfvqa
View on GitHub
☆18Jun 12, 2024Updated 2 years ago
SALT-NLP / PersuationGames
View on GitHub
[ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduc…
☆16Feb 22, 2025Updated last year
princeton-nlp / CharXiv
View on GitHub
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
☆158Apr 22, 2025Updated last year
Kevinz-code / SeVa
View on GitHub
[MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501
☆60Jul 26, 2024Updated last year
aim-uofa / DiffewS
View on GitHub
[NeurIPS'24] Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation (Diffews)
☆51Apr 14, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
heshuting555 / SegPoint
View on GitHub
☆38Jul 19, 2024Updated 2 years ago
SCNU203 / GeoQA-Plus
View on GitHub
☆20May 14, 2024Updated 2 years ago
FanqingM / MM-Eureka-V0
View on GitHub
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆325Jun 21, 2025Updated last year
YiyangZhou / POVID
View on GitHub
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆94Apr 30, 2024Updated 2 years ago
omipan / svl_adapter
View on GitHub
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
☆21Jan 11, 2024Updated 2 years ago
yuezih / less-is-more
View on GitHub
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆58Oct 28, 2024Updated last year
JustinYuu / MACIL_SD
View on GitHub
[ACM MM 2022] Modality-aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection
☆42Jul 13, 2022Updated 4 years ago