OpenRobotLab / OST-BenchLinks

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding

☆41

Alternatives and similar repositories for OST-Bench

Users that are interested in OST-Bench are comparing it to the libraries listed below

Sorting:

baaivision / UniVLA
Unified Vision-Language-Action Model
☆108Updated last week
LaVi-Lab / Video-3D-LLM
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
☆133Updated last month
OpenRobotLab / MMSI-Bench
[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
☆42Updated last week
MSR3D / MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
☆60Updated 3 weeks ago
Haochen-Wang409 / ross3d
Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".
☆35Updated 2 weeks ago
OpenRobotLab / Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
☆123Updated 6 months ago
JeffWang987 / EgoVid
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
☆109Updated 7 months ago
ZCMax / ScanReason
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
☆75Updated 9 months ago
mll-lab-nu / MindCube
☆69Updated 2 weeks ago
YunzeMan / Situation3D
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
☆39Updated 7 months ago
diankun-wu / Spatial-MLLM
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆285Updated 3 weeks ago
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆156Updated 2 months ago
TencentARC / Moto
[ICCV 2025] Latent Motion Token as the Bridging Language for Robot Manipulation
☆110Updated 2 months ago
ZCMax / LLaVA-3D
[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
☆281Updated this week
OpenRobotLab / StreamVLN
Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"
☆76Updated this week
Zhoues / RoboRefer
Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
☆96Updated last week
AnjieCheng / SpatialRGPT
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆215Updated 6 months ago
Little-Podi / AdaWorld
[ICML'25] The PyTorch implementation of paper: "AdaWorld: Learning Adaptable World Models with Latent Actions".
☆125Updated 3 weeks ago
sg-3d / sg3d
☆49Updated 9 months ago
HaoyiZhu / SPA
[ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
☆162Updated 3 weeks ago
fudan-zvg / spar
☆48Updated last month
facebookresearch / Multi-SpatialMLLM
Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
☆133Updated last month
Dantong88 / LLARVA
☆49Updated 6 months ago
alibaba-damo-academy / WorldVLA
WorldVLA: Towards Autoregressive Action World Model
☆248Updated last week
Ivan-Tang-3D / ENEL
The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"
☆53Updated last month
thuml / iVideoGPT
Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223
☆136Updated last month
maitrix-org / SimWorld
Main repo for SimWorld simulator.
☆53Updated 3 weeks ago
SilongYong / SQA3D
[ICLR 2023] SQA3D for embodied scene understanding and reasoning
☆134Updated last year
ATR-DBI / ScanQA
☆126Updated last year
GenEx-world / genex
Generative World Explorer
☆148Updated 3 weeks ago