MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
☆57Mar 11, 2026Updated last month
Alternatives and similar repositories for MMSI-Video-Bench
Users that are interested in MMSI-Video-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆76Sep 29, 2025Updated 7 months ago
- [ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆86Apr 22, 2026Updated last week
- the official repo for EMNLP 2024 (main) paper "EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimo…☆21Apr 9, 2025Updated last year
- InternRobotics' open-source toolbox for vision-based embodied spatial intelligence.☆48Sep 18, 2025Updated 7 months ago
- ☆69Apr 3, 2026Updated 3 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)☆30Oct 28, 2025Updated 6 months ago
- ☆22Mar 7, 2025Updated last year
- Implementation of <Symbolic Graphics Programming with Large Language Models>☆38Sep 14, 2025Updated 7 months ago
- Minimalist RL for Diffusion LLMs with SOTA reasoning performance (89.1% GSM8K). Official implementation of "The Flexibility Trap".☆134Apr 3, 2026Updated 3 weeks ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Apr 18, 2026Updated last week
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆88Mar 9, 2026Updated last month
- [ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging☆90Apr 21, 2026Updated last week
- VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation☆18Jun 2, 2025Updated 10 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Understand what physics/algorithms do transformers learn internally when trained on planetary motion☆41Feb 9, 2026Updated 2 months ago
- [NeurIPS 2025] Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking☆30Mar 18, 2026Updated last month
- Code of paper "HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks"☆24Oct 8, 2025Updated 6 months ago
- [NeurIPS 2024] CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting☆20Dec 31, 2024Updated last year
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆73Feb 7, 2026Updated 2 months ago
- [CVPR 2025] Mosaic3D: Foundation Dataset and Model for Open-vocabulary 3D Segmentation☆67Jan 6, 2026Updated 3 months ago
- Official implementation of "What does CLIP know about a red circle? Visual Prompt Engineering for VLMs", ICCV 2023☆12Sep 21, 2023Updated 2 years ago
- Cambrian-S: Towards Spatial Supersensing in Video☆538Apr 3, 2026Updated 3 weeks ago
- ECCV 2026 paper template☆41Jan 23, 2026Updated 3 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆11Mar 22, 2024Updated 2 years ago
- Hy3 preview (295B A21B), a leading reasoning and agent model in its size, with great cost efficiency☆234Updated this week
- Minimal Academic Website Template☆16Feb 20, 2025Updated last year
- A simple visual test-time scaling method for GUI agent grounding☆24Dec 7, 2025Updated 4 months ago
- Implementation of <Model Merging with Functional Dual Anchors>☆47Nov 23, 2025Updated 5 months ago
- Video Reasoning Segmentation☆27Nov 29, 2024Updated last year
- Official repo for ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models☆28Mar 24, 2025Updated last year
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆83Jan 21, 2026Updated 3 months ago
- Official code of DMA: Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding, ECCV 2024☆32Jul 18, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆125Feb 24, 2026Updated 2 months ago
- Models and code for the ICLR 2020 workshop paper "Towards Understanding Normalization in Neural ODEs"☆16Apr 27, 2020Updated 6 years ago
- ☆43Apr 8, 2026Updated 3 weeks ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆87Jun 6, 2025Updated 10 months ago
- Official training code for MUG-V 10B video generation model. Built on Megatron-LM (v0.14.0) with production-ready distributed training fo…☆20Oct 20, 2025Updated 6 months ago
- Blending Custom Photos with Video Diffusion Transformers☆50Jan 21, 2025Updated last year
- Open-source self-hosted password manager built with Flutter. Store passwords and crypto seed phrases securely without cloud storage.☆62Apr 18, 2026Updated last week