Rh-Dang / ECBenchLinks
A Holistic Embodied Cognition Benchmark
☆18Updated 10 months ago
Alternatives and similar repositories for ECBench
Users that are interested in ECBench are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆83Updated 11 months ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆72Updated 2 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Updated last year
- ☆63Updated last week
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning☆79Updated last year
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆87Updated 8 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- Egocentric Video Understanding Dataset (EVUD)☆32Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆38Updated 2 weeks ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- ☆37Updated 8 months ago
- ☆110Updated last year
- ☆41Updated 8 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆41Updated 6 months ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Updated 5 months ago
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆52Updated 6 months ago
- VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆61Updated last month
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆43Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Updated last year
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"☆34Updated last year
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆77Updated 2 weeks ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆46Updated 8 months ago
- ☆97Updated 7 months ago
- ☆117Updated 6 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆45Updated 7 months ago
- ☆41Updated 5 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆39Updated 11 months ago