EvolvingLMMs-Lab / EASILinks
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
☆44Updated this week
Alternatives and similar repositories for EASI
Users that are interested in EASI are comparing it to the libraries listed below
Sorting:
- ☆55Updated 3 weeks ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆46Updated 5 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆122Updated 4 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆177Updated last week
- ☆95Updated 5 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆197Updated 4 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆110Updated 3 months ago
- ☆79Updated 5 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆186Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆104Updated last month
- Quick Long Video Understanding☆70Updated last month
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆37Updated 5 months ago
- ☆62Updated 3 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆80Updated 4 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction☆149Updated 8 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆131Updated 3 months ago
- ☆63Updated 4 months ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆82Updated last week
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 10 months ago
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆137Updated 4 months ago
- ☆68Updated 2 months ago
- ☆63Updated last month
- TStar is a unified temporal search framework for long-form video question answering☆75Updated 3 months ago
- The code repository of UniRL☆46Updated 6 months ago
- An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"☆147Updated last month
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆32Updated 3 months ago
- LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling☆95Updated this week
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆120Updated 3 weeks ago
- The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆116Updated last week
- [NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…☆144Updated 2 months ago