[NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
☆71Sep 29, 2025Updated 5 months ago
Alternatives and similar repositories for OST-Bench
Users that are interested in OST-Bench are comparing it to the libraries listed below
Sorting:
- MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence☆55Feb 10, 2026Updated 2 weeks ago
- [ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆77Feb 16, 2026Updated last week
- InternRobotics' open-source toolbox for vision-based embodied spatial intelligence.☆47Sep 18, 2025Updated 5 months ago
- [AAAI26 oral] CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling☆88Jan 11, 2026Updated last month
- [AAAI 2026] GenMAC for Compositional Text-to-Video Generation☆32Jan 10, 2026Updated last month
- ☆47Apr 20, 2025Updated 10 months ago
- [ICRA 2026] Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"☆407Nov 2, 2025Updated 3 months ago
- [SIGGRAPH Asia 2025] CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling☆44Sep 26, 2025Updated 5 months ago
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆31Oct 2, 2025Updated 4 months ago
- [ICLR 2026] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation☆102Jan 27, 2026Updated last month
- InternRobotics' open platform for building generalized navigation foundation models.☆688Feb 11, 2026Updated 2 weeks ago
- Official implementation of "OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes".☆90Jan 14, 2026Updated last month
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- ☆22Jan 12, 2026Updated last month
- [NeurIPS 2023] MoVie: Visual Model-Based Policy Adaptation for View Generalization☆11Sep 22, 2023Updated 2 years ago
- Symbolic Graphics Programming with Large Language Models☆37Sep 14, 2025Updated 5 months ago
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI☆652Jun 13, 2025Updated 8 months ago
- ☆50Jun 4, 2025Updated 8 months ago
- [ICLR26] GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆103Jan 27, 2026Updated last month
- [ICCV 2025] GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene☆167Feb 16, 2026Updated last week
- ☆98Jun 23, 2025Updated 8 months ago
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆81Oct 10, 2024Updated last year
- ☆15Sep 22, 2025Updated 5 months ago
- [SIGGRAPH Asia 2025] 4DSloMo: 4D Reconstruction for High Speed Scene with Asynchronous Capture☆132Oct 27, 2025Updated 4 months ago
- [NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentation☆92Jun 24, 2024Updated last year
- 🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"☆24Dec 14, 2025Updated 2 months ago
- LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba (Official Implementation)☆17Oct 24, 2024Updated last year
- An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation☆16Oct 27, 2024Updated last year
- [ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos☆163Oct 1, 2025Updated 4 months ago
- Code for "CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects", NeurIPS 2025☆87Jan 6, 2026Updated last month
- [ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents☆33Feb 1, 2026Updated 3 weeks ago
- [RSS 2025] Gripper Keypose and Object Pointflow as Interfaces for Bimanual Robotic Manipulation☆76Jul 22, 2025Updated 7 months ago
- An All-in-one robot manipulation learning suite for policy models training and evaluation on various datasets and benchmarks.☆169Oct 15, 2025Updated 4 months ago
- A unified robotic manipulation learning framework☆21Sep 4, 2025Updated 5 months ago
- ☆184Jul 25, 2025Updated 7 months ago
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆78Jan 21, 2026Updated last month
- [EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"☆29Jun 3, 2025Updated 8 months ago
- Official repo for StyleMe3D☆28Apr 22, 2025Updated 10 months ago
- Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models☆30Oct 6, 2025Updated 4 months ago