JaaackHongggg / WorldSenseLinks

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs

☆26

Alternatives and similar repositories for WorldSense

Users that are interested in WorldSense are comparing it to the libraries listed below

Sorting:

AV-Odyssey / AV-Odyssey
This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"
☆26Updated 7 months ago
Liuziyu77 / MIA-DPO
Official implement of MIA-DPO
☆63Updated 6 months ago
BriansIDP / video-SALMONN-o1
☆33Updated 2 months ago
aiming-lab / ReAgent-V
☆27Updated 2 months ago
DAMO-NLP-SG / CMM
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆46Updated 3 weeks ago
Cooperx521 / ScaleCap
Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆52Updated last month
path2generalist / General-Level
On Path to Multimodal Generalist: General-Level and General-Bench
☆19Updated 3 weeks ago
deep-spin / Infinite-Video
\infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
☆15Updated 5 months ago
JiuTian-VL / MoME
[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆69Updated 3 months ago
HumanMLLM / ViSpeak
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
☆36Updated last month
Aurora-slz / MM-Verify
☆13Updated 5 months ago
HarryHsing / EchoInk
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Vi…
☆47Updated 2 months ago
lzw-lzw / UnifiedMLLM
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
☆22Updated last year
MikeWangWZHL / dymu
☆16Updated 2 months ago
JoeLeelyf / OVO-Bench
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆77Updated 2 weeks ago
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆180Updated last month
GaryStack / MMR-V
Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?
☆35Updated last month
xuyang-liu16 / GlobalCom2
🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆30Updated 2 weeks ago
yliu-cs / PiTe
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆16Updated 5 months ago
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆27Updated this week
yale-nlp / TOMATO
☆28Updated 9 months ago
MikeWangWZHL / PAPO
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆70Updated this week
yellow-binary-tree / MMDuet
Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…
☆33Updated 6 months ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated last month
Hon-Wong / ByteVideoLLM
[ICCV 2025] Dynamic-VLM
☆23Updated 7 months ago
AlenjandroWang / ASVR
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
☆36Updated last month
MME-Benchmarks / MME-Unify
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆41Updated 3 months ago
ekonwang / VisuoThink
[Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Mul…
☆27Updated 2 weeks ago
EvolvingLMMs-Lab / MGPO
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆44Updated 2 weeks ago
Hongcheng-Gao / HAVEN
Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".
☆17Updated 2 months ago