OpenRobotLab / MMSI-BenchLinks

[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

☆37

Alternatives and similar repositories for MMSI-Bench

Users that are interested in MMSI-Bench are comparing it to the libraries listed below

Sorting:

MSR3D / MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
☆60Updated last week
Zhoues / RoboRefer
Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
☆65Updated this week
sg-3d / sg3d
☆49Updated 8 months ago
TencentARC / Moto
Latent Motion Token as the Bridging Language for Robot Manipulation
☆105Updated last month
jmwang0117 / Video4Robot
List of papers on video-centric robot learning
☆21Updated 7 months ago
ZCMax / ScanReason
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
☆74Updated 8 months ago
HeegerGao / FLIP
Code for FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks
☆67Updated 6 months ago
YunzeMan / Situation3D
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
☆38Updated 6 months ago
MARS-EAI / RoboFactory
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
☆49Updated 3 weeks ago
qizekun / SoFar
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
☆165Updated last month
LaVi-Lab / Video-3D-LLM
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
☆117Updated 3 weeks ago
Haochen-Wang409 / ross3d
Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".
☆29Updated 2 weeks ago
michaelyuancb / general_flow
Repository for "General Flow as Foundation Affordance for Scalable Robot Learning"
☆56Updated 6 months ago
MCG-NJU / Tra-MoE
[CVPR 2025] Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
☆36Updated 2 months ago
Dantong88 / LLARVA
☆46Updated 6 months ago
Fanqi-Lin / OneTwoVLA
Official implementation of "OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning"
☆106Updated 3 weeks ago
OpenRobotLab / VLM-Grounder
[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
☆107Updated last month
OpenRobotLab / Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
☆120Updated 5 months ago
maitrix-org / SimWorld
Main repo for SimWorld simulator.
☆37Updated this week
staymylove / 3DMIT
Code of 3DMIT: 3D MULTI-MODAL INSTRUCTION TUNING FOR SCENE UNDERSTANDING
☆30Updated 11 months ago
Tengbo-Yu / AnyBimanual
AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation
☆77Updated 2 months ago
jiaming-zhou / X-ICM
official repo for AGNOSTOS, a cross-task manipulation benchmark, and X-ICM method, a cross-task in-context manipulation (VLA) method
☆29Updated last month
ATR-DBI / ScanQA
☆124Updated last year
yl3800 / LASO
☆33Updated 10 months ago
fudan-zvg / spar
☆47Updated last month
RoboDita / Dita
☆95Updated last month
HeegerGao / VLA-OS
Official Code For VLA-OS.
☆17Updated this week
HaoyiZhu / PointCloudMatters
[NeurIPS 2024 D&B] Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
☆80Updated 8 months ago
UMass-Embodied-AGI / MultiPLY
Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
☆130Updated 8 months ago
JeffWang987 / EgoVid
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
☆108Updated 7 months ago