JaaackHongggg / WorldSenseLinks
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
☆23Updated last month
Alternatives and similar repositories for WorldSense
Users that are interested in WorldSense are comparing it to the libraries listed below
Sorting:
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆25Updated 5 months ago
- Official implement of MIA-DPO☆58Updated 4 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆59Updated 2 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Updated 10 months ago
- ☆29Updated this week
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆45Updated 4 months ago
- Official PyTorch Code of ReKV (ICLR'25)☆23Updated 2 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆35Updated last month
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 5 months ago
- Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆22Updated this week
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆66Updated last month
- ☆37Updated 10 months ago
- ☆84Updated 2 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆60Updated 11 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆34Updated 2 months ago
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆46Updated 3 weeks ago
- ☆43Updated 5 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆37Updated 5 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆72Updated 2 weeks ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆53Updated 2 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆15Updated 2 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 7 months ago
- LMM solved catastrophic forgetting, AAAI2025☆43Updated last month
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆49Updated this week
- Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward☆35Updated 2 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆46Updated 2 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆22Updated 4 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆63Updated 10 months ago
- Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆20Updated last month
- ☆77Updated 4 months ago