BAAI-DCAI/MMVU

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BAAI-DCAI/MMVU)

BAAI-DCAI / MMVU

☆57

Alternatives and similar repositories for MMVU

Users that are interested in MMVU are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LAW1223 / OpenSubject
View on GitHub
☆55Dec 10, 2025Updated 7 months ago
BBBiiinnn / SynArtifact
View on GitHub
☆18Apr 28, 2024Updated 2 years ago
GeekGuru123 / ProfilingDiT
View on GitHub
☆20Jan 1, 2026Updated 6 months ago
BAAI-DCAI / Bunny
View on GitHub
A family of lightweight multimodal models.
☆1,052Nov 18, 2024Updated last year
BAAI-DCAI / DataOptim
View on GitHub
A collection of visual instruction tuning datasets.
☆77Mar 14, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
chrisx599 / Video-Browser
View on GitHub
Official code repo of Video-Browser: Towards Agentic Open-web Video Browsing
☆28Jan 19, 2026Updated 6 months ago
Sammy20207109 / DyCo-RL
View on GitHub
DyCo-RL: Dynamic Cross-Modal Coordination for Visual Reasoning
☆18Jun 14, 2026Updated last month
BAAI-DCAI / Visual-Instruction-Tuning
View on GitHub
SVIT: Scaling up Visual Instruction Tuning
☆167Jun 20, 2024Updated 2 years ago
KyleHuang9 / SeFAR
View on GitHub
[AAAI 2025] SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
☆30Jan 3, 2025Updated last year
EthanLiang99 / AuthFace
View on GitHub
AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior (ACM MM 2025 Oral)
☆18Mar 5, 2026Updated 4 months ago
CMMMU-Benchmark / CMMMU
View on GitHub
☆48Sep 5, 2024Updated last year
DuNGEOnmassster / VideoGen-of-Thought
View on GitHub
[Neurips 2025 NextVid Workshop Oral✨] Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minim…
☆63Sep 22, 2025Updated 9 months ago
RUCAIBox / Event-Bench
View on GitHub
Official code of *Towards Event-oriented Long Video Understanding*
☆12Jul 26, 2024Updated last year
IDEA-Research / V-Reflection
View on GitHub
Related code, checkpoints and project page for V-Reflection
☆60Apr 7, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
JUNJIE99 / MLVU
View on GitHub
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
☆263Apr 13, 2026Updated 3 months ago
vl-illusion / GVIL
View on GitHub
Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"
☆15Jan 25, 2024Updated 2 years ago
BAAI-DCAI / Dataset-Pruning
View on GitHub
Dataset pruning for ImageNet and LAION-2B.
☆80Jul 5, 2024Updated 2 years ago
shuyansy / MLLM-Semantic-Hallucination
View on GitHub
🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning
☆30Dec 11, 2025Updated 7 months ago
HaroldChen19 / VistaDPO
View on GitHub
[ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
☆41Jun 14, 2025Updated last year
Lexiang-Xiong / CAD
View on GitHub
[ECCV 2026] Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models
☆28Jun 20, 2026Updated last month
Liyan06 / AggreFact
View on GitHub
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors (ACL 2023)
☆28Mar 26, 2024Updated 2 years ago
WenjieShu / LoopViT
View on GitHub
☆45Feb 4, 2026Updated 5 months ago
DuNGEOnmassster / awesome-customized-generative-AI
View on GitHub
Papers and codes collection for customized, personalized and editable generative models
☆28Oct 1, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
CNVid / CNVid-3.5M
View on GitHub
This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…
☆26Nov 28, 2023Updated 2 years ago
rileycai / newsSearch
View on GitHub
社会新闻检索系统，信息检索导论
☆13Nov 19, 2019Updated 6 years ago
zhengxuJosh / SAM4SS
View on GitHub
SAM4SS: Tailoring SAM and SAM2 for Semantic Segmentation
☆11Jul 31, 2024Updated last year
pkunlp-icler / PCA-EVAL
View on GitHub
[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
☆107Mar 14, 2024Updated 2 years ago
SuhZhang / GeoSR
View on GitHub
The code for paper 'Make Geometry Matter for Spatial Reasoning'
☆53Updated this week
BAAI-DCAI / Training-Data-Synthesis
View on GitHub
[ICLR 2024] Real-Fake: Effective Training Data Synthesis Through Distribution Matching
☆80Dec 9, 2023Updated 2 years ago
ArtsEngine / concreteness
View on GitHub
concreteness ratings list
☆27Feb 22, 2017Updated 9 years ago
hellomuffin / exif-as-language
View on GitHub
official repo for the paper "EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata"
☆53Nov 3, 2023Updated 2 years ago
dali92002 / OCR-TR
View on GitHub
Optocal Character Recognition (OCR / HTR) using Transformers
☆11Aug 20, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
shuyansy / VidText
View on GitHub
Comprehensive benchmark for video text understanding
☆29Jun 4, 2025Updated last year
yunfanLu / Self-EvRSVFI
View on GitHub
[IEEE TVCG 2025] Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames
☆11Jun 1, 2025Updated last year
FudanDISC / ReForm-Eval
View on GitHub
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
☆46Nov 17, 2023Updated 2 years ago
Dongping-Chen / MLLM-Judge
View on GitHub
[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.
☆94Feb 17, 2025Updated last year
JWLiang007 / PFF
View on GitHub
Official implementation of "Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery Detection" (ICLR 2024)
☆18Apr 15, 2024Updated 2 years ago
BeingBeyond / FAST
View on GitHub
General Humanoid Whole-Body Control via Pretraining and Rapid Adaptation
☆18Feb 13, 2026Updated 5 months ago
czg1225 / VeriThinker
View on GitHub
[NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient
☆67Sep 27, 2025Updated 9 months ago