Rh-Dang / ECBenchLinks
A Holistic Embodied Cognition Benchmark
☆18Updated 8 months ago
Alternatives and similar repositories for ECBench
Users that are interested in ECBench are comparing it to the libraries listed below
Sorting:
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆36Updated 4 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆34Updated 3 weeks ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆81Updated 5 months ago
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆36Updated 10 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Updated 11 months ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆65Updated 3 weeks ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Updated 3 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Updated last year
- ☆95Updated 5 months ago
- Egocentric Video Understanding Dataset (EVUD)☆32Updated last year
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆63Updated 4 months ago
- ☆62Updated 3 months ago
- ☆108Updated 4 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 4 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆129Updated 4 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆106Updated last month
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆50Updated 5 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆33Updated 2 weeks ago
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆79Updated 9 months ago
- A collection of awesome think with videos papers.☆72Updated last week
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆38Updated 9 months ago
- Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"☆104Updated last week
- ☆16Updated 2 months ago
- ☆104Updated 11 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆65Updated last year
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"☆33Updated last year
- ☆46Updated 11 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆65Updated 6 months ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆42Updated last month
- [ICCV 2025] Dynamic-VLM☆26Updated 11 months ago