MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
☆57Mar 11, 2026Updated 2 months ago
Alternatives and similar repositories for MMSI-Video-Bench
Users that are interested in MMSI-Video-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆76Sep 29, 2025Updated 8 months ago
- [ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆95Apr 28, 2026Updated last month
- the official repo for EMNLP 2024 (main) paper "EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimo…☆21Apr 9, 2025Updated last year
- InternRobotics' open-source toolbox for vision-based embodied spatial intelligence.☆48Sep 18, 2025Updated 8 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)☆30Oct 28, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆22Mar 7, 2025Updated last year
- ☆86Apr 3, 2026Updated 2 months ago
- Implementation of <Symbolic Graphics Programming with Large Language Models>☆38Sep 14, 2025Updated 8 months ago
- [NeurIPS 2023] MoVie: Visual Model-Based Policy Adaptation for View Generalization☆11Sep 22, 2023Updated 2 years ago
- [ICML 2026 Oral] Minimalist RL for Diffusion LLMs. 89.1% on GSM8K.☆145May 26, 2026Updated 2 weeks ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Apr 18, 2026Updated last month
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆97Mar 9, 2026Updated 3 months ago
- ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models☆16Sep 27, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation☆20Jun 2, 2025Updated last year
- [ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging☆104Apr 30, 2026Updated last month
- [NeurIPS 2025] Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking☆31May 7, 2026Updated last month
- Code of paper "HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks"☆24Oct 8, 2025Updated 8 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆75Feb 7, 2026Updated 4 months ago
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams☆106Mar 15, 2026Updated 2 months ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆19Feb 29, 2024Updated 2 years ago
- 恢复旧版Bilibili页面,为了那些念旧的人。☆23May 31, 2026Updated last week
- Cambrian-S: Towards Spatial Supersensing in Video☆549Apr 3, 2026Updated 2 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ECCV 2026 paper template☆40Jan 23, 2026Updated 4 months ago
- PaperBot: Learning to Design Real-World Tools Using Paper☆13Mar 15, 2024Updated 2 years ago
- Official repo and evaluation implementation of KnowRecall and VisRecall☆10May 22, 2025Updated last year
- ☆11Mar 22, 2024Updated 2 years ago
- A simple visual test-time scaling method for GUI agent grounding☆26Dec 7, 2025Updated 6 months ago
- Software Engineering Economy | Tongji Univ. SSE Course Design☆11Sep 19, 2020Updated 5 years ago
- Implementation of <Model Merging with Functional Dual Anchors>☆47Nov 23, 2025Updated 6 months ago
- Video Reasoning Segmentation☆27Nov 29, 2024Updated last year
- This repo provides methods for building and evaluating Retrieval Augmented Generation (RAG) systems.☆18Sep 25, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆87Jan 21, 2026Updated 4 months ago
- Official code of DMA: Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding, ECCV 2024☆32Jul 18, 2024Updated last year
- Models and code for the ICLR 2020 workshop paper "Towards Understanding Normalization in Neural ODEs"☆16Apr 27, 2020Updated 6 years ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆86Jun 6, 2025Updated last year
- The official codes of Learning to Decouple the Lights for 3D Face Texture Modeling (NeurIPS'24)☆14Mar 17, 2025Updated last year
- SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards☆39Jan 28, 2026Updated 4 months ago
- ☆39Feb 3, 2026Updated 4 months ago