Rh-Dang / ECBenchLinks
A Holistic Embodied Cognition Benchmark
☆18Updated 10 months ago
Alternatives and similar repositories for ECBench
Users that are interested in ECBench are comparing it to the libraries listed below
Sorting:
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆72Updated 2 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Updated last year
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆83Updated 11 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Updated last year
- ☆110Updated last year
- VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆61Updated last month
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆41Updated 6 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆79Updated this week
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆52Updated 6 months ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Updated 5 months ago
- ☆46Updated last year
- A collection of awesome think with videos papers.☆87Updated 2 months ago
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆43Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆42Updated 3 months ago
- ☆63Updated last week
- [IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning☆79Updated last year
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆57Updated 2 weeks ago
- ☆41Updated 8 months ago
- Egocentric Video Understanding Dataset (EVUD)☆32Updated last year
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆46Updated 8 months ago
- ☆117Updated 6 months ago
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"☆34Updated last year
- ☆97Updated 7 months ago
- [ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning☆44Updated 3 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆45Updated 7 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- ☆41Updated 5 months ago
- [ICCV 2025] Dynamic-VLM☆28Updated last year
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆65Updated 5 months ago