MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
☆57Mar 11, 2026Updated 3 weeks ago
Alternatives and similar repositories for MMSI-Video-Bench
Users that are interested in MMSI-Video-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆74Sep 29, 2025Updated 6 months ago
- the official repo for EMNLP 2024 (main) paper "EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimo…☆21Apr 9, 2025Updated last year
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)☆30Oct 28, 2025Updated 5 months ago
- ☆22Mar 7, 2025Updated last year
- Implementation of <Symbolic Graphics Programming with Large Language Models>☆38Sep 14, 2025Updated 6 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [NeurIPS 2023] MoVie: Visual Model-Based Policy Adaptation for View Generalization☆11Sep 22, 2023Updated 2 years ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Jan 1, 2026Updated 3 months ago
- [ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆83Mar 13, 2026Updated 3 weeks ago
- [ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging☆49Mar 31, 2026Updated last week
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆87Mar 9, 2026Updated last month
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams☆76Mar 15, 2026Updated 3 weeks ago
- ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models☆16Sep 27, 2024Updated last year
- Understand what physics/algorithms do transformers learn internally when trained on planetary motion☆39Feb 9, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆46Updated this week
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆70Feb 7, 2026Updated 2 months ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆19Feb 29, 2024Updated 2 years ago
- Cambrian-S: Towards Spatial Supersensing in Video☆534Updated this week
- ECCV 2026 paper template☆41Jan 23, 2026Updated 2 months ago
- PaperBot: Learning to Design Real-World Tools Using Paper☆13Mar 15, 2024Updated 2 years ago
- Minimal Academic Website Template☆14Feb 20, 2025Updated last year
- A simple visual test-time scaling method for GUI agent grounding☆21Dec 7, 2025Updated 4 months ago
- Software Engineering Economy | Tongji Univ. SSE Course Design☆11Sep 19, 2020Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Implementation of <Model Merging with Functional Dual Anchors>☆47Nov 23, 2025Updated 4 months ago
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆83Jan 21, 2026Updated 2 months ago
- Official code of DMA: Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding, ECCV 2024☆32Jul 18, 2024Updated last year
- Models and code for the ICLR 2020 workshop paper "Towards Understanding Normalization in Neural ODEs"☆16Apr 27, 2020Updated 5 years ago
- ☆36Updated this week
- [SIGGRAPH Asia 2025] Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization☆35Nov 30, 2025Updated 4 months ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆88Jun 6, 2025Updated 10 months ago
- Official training code for MUG-V 10B video generation model. Built on Megatron-LM (v0.14.0) with production-ready distributed training fo…☆19Oct 20, 2025Updated 5 months ago
- Blending Custom Photos with Video Diffusion Transformers☆48Jan 21, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Open-source self-hosted password manager built with Flutter. Store passwords and crypto seed phrases securely without cloud storage.☆53Updated this week
- The official codes of Learning to Decouple the Lights for 3D Face Texture Modeling (NeurIPS'24)☆14Mar 17, 2025Updated last year
- ☆13Mar 9, 2024Updated 2 years ago
- ☆30Aug 21, 2025Updated 7 months ago
- ☆17Apr 17, 2025Updated 11 months ago
- A simple Read-It-Later and link collection tool, AI-powered for text and images, multi-platform, open-source. A browser extension availab…☆12May 13, 2025Updated 10 months ago
- collab-dev - Collaboration Metrics for Code Reviews☆23May 12, 2025Updated 10 months ago