MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
☆55Feb 10, 2026Updated 2 weeks ago
Alternatives and similar repositories for MMSI-Video-Bench
Users that are interested in MMSI-Video-Bench are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆71Sep 29, 2025Updated 5 months ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Jan 1, 2026Updated last month
- ☆49Feb 13, 2026Updated 2 weeks ago
- Minimalist RL for Diffusion LLMs with SOTA reasoning performance (89.1% GSM8K). Official implementation of "The Flexibility Trap".☆119Jan 24, 2026Updated last month
- ☆22Mar 7, 2025Updated 11 months ago
- the official repo for EMNLP 2024 (main) paper "EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimo…☆20Apr 9, 2025Updated 10 months ago
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Aug 24, 2023Updated 2 years ago
- InternRobotics' open-source toolbox for vision-based embodied spatial intelligence.☆47Sep 18, 2025Updated 5 months ago
- Video Reasoning Segmentation☆28Nov 29, 2024Updated last year
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆57Jan 23, 2026Updated last month
- sora2 free watermark remover☆767Feb 20, 2026Updated last week
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆73Feb 13, 2026Updated 2 weeks ago
- Cambrian-S: Towards Spatial Supersensing in Video☆497Dec 27, 2025Updated 2 months ago
- ☆14Feb 13, 2026Updated 2 weeks ago
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- Official code of DMA: Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding, ECCV 2024☆31Jul 18, 2024Updated last year
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- [CVPR 2026] Official repo for "VideoSSR: Video Self-Supervised Reinforcement Learning"☆32Nov 11, 2025Updated 3 months ago
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆78Jan 21, 2026Updated last month
- ☆11Jan 18, 2025Updated last year
- Understand what physics/algorithms do transformers learn internally when trained on planetary motion☆35Feb 9, 2026Updated 2 weeks ago
- ☆10Apr 7, 2025Updated 10 months ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- SimX-OR: Extending Any Simulation Benchmark to Evaluate the Observational Robustness of VLA Models☆31Nov 4, 2025Updated 3 months ago
- ☆31Feb 3, 2026Updated 3 weeks ago
- [ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆77Feb 16, 2026Updated last week
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Jun 12, 2025Updated 8 months ago
- Code of paper "HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks"☆22Oct 8, 2025Updated 4 months ago
- 在 Mirai Console 中使用MCL管理包和其他高级功能☆10Nov 13, 2022Updated 3 years ago
- ☆13Jan 21, 2025Updated last year
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆38Oct 9, 2025Updated 4 months ago
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated 2 years ago
- ☆14Nov 23, 2024Updated last year
- ☆10Jan 9, 2025Updated last year
- The official repository of UVOSAM☆13Jun 5, 2024Updated last year
- Official Implementation for ACM MM2024 paper "VrdONE: One-stage Video Visual Relation Detection".☆11Nov 13, 2024Updated last year
- Symbolic Graphics Programming with Large Language Models☆37Sep 14, 2025Updated 5 months ago