THUNLP-MT / EscapeCraftLinks
Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.
☆36Updated 7 months ago
Alternatives and similar repositories for EscapeCraft
Users that are interested in EscapeCraft are comparing it to the libraries listed below
Sorting:
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆88Updated last year
- ☆81Updated 7 months ago
- We introduce BabyVision, a benchmark revealing the infancy of AI vision.☆173Updated 3 weeks ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆37Updated last year
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆31Updated 6 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- Multimodal RewardBench☆60Updated 11 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆60Updated last year
- ☆28Updated 11 months ago
- Official implement of MIA-DPO☆70Updated last year
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Updated 6 months ago
- ☆116Updated 6 months ago
- ☆46Updated last year
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆58Updated last week
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆94Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Updated 10 months ago
- ☆110Updated last year
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆240Updated 6 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆118Updated 6 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Updated 6 months ago
- Co-Reinforcement Learning for Unified Multimodal Understanding and Generation☆39Updated 6 months ago
- TStar is a unified temporal search framework for long-form video question answering☆86Updated 5 months ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆66Updated last month
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆142Updated last week
- A collection of awesome think with videos papers.☆86Updated 2 months ago
- Official code for MotionBench (CVPR 2025)☆63Updated 11 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆103Updated 6 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Updated last year
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆237Updated last week