BAAI-DCAI / SpatialBotLinks
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
β281Updated last month
Alternatives and similar repositories for SpatialBot
Users that are interested in SpatialBot are comparing it to the libraries listed below
Sorting:
- π₯ SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.β392Updated 3 weeks ago
- WorldVLA: Towards Autoregressive Action World Modelβ248Updated last week
- HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Modelβ254Updated last month
- [CVPR 2025] The offical Implementation of "Universal Actions for Enhanced Embodied Foundation Models"β184Updated 3 months ago
- [ICML 2024] Official code repository for 3D embodied generalist agent LEOβ446Updated 2 months ago
- A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulationβ301Updated last month
- [ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Modelβ540Updated 8 months ago
- The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`β128Updated 6 months ago
- β372Updated 5 months ago
- SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulationβ175Updated 2 weeks ago
- Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations https://video-prediction-policy.github.ioβ225Updated last month
- Official implementation of "OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning"β147Updated last month
- ICCV2025β103Updated 2 weeks ago
- [CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulationβ149Updated 3 weeks ago
- [RSS 2025] Learning to Act Anywhere with Task-centric Latent Actionsβ551Updated 2 weeks ago
- OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulationβ216Updated last month
- Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"β263Updated last year
- β242Updated 3 months ago
- Latest Advances on Vison-Language-Action Models.β83Updated 4 months ago
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasksβ144Updated last month
- [ICLR 2025] LAPA: Latent Action Pretraining from Videosβ326Updated 5 months ago
- [CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.β263Updated last month
- LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintainedπ₯]β88Updated this week
- GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimizationβ132Updated 3 months ago
- The Official Implementation of RoboMatrixβ93Updated last month
- The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation(cvpr 2024)β133Updated last year
- Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.β253Updated 2 weeks ago
- π€ RoboOS: A Universal Embodied Operating System for Cross-Embodied and Multi-Robot Collaborationβ115Updated last week
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D Worldβ130Updated 8 months ago
- Single-file implementation to advance vision-language-action (VLA) models with reinforcement learning.β163Updated last week