OpenGVLab / VeBrainLinks
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
☆77Updated 2 months ago
Alternatives and similar repositories for VeBrain
Users that are interested in VeBrain are comparing it to the libraries listed below
Sorting:
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆109Updated last week
- ☆41Updated last month
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆68Updated 2 months ago
- 🦾 A Dual-System VLA with System2 Thinking☆84Updated 3 weeks ago
- Unified Vision-Language-Action Model☆170Updated 2 weeks ago
- OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆58Updated 2 weeks ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆76Updated 5 months ago
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆115Updated last week
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆66Updated 2 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆140Updated 2 months ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆61Updated 4 months ago
- ☆77Updated 11 months ago
- OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆52Updated this week
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆81Updated 2 months ago
- Code for Stable Control Representations☆25Updated 4 months ago
- ☆82Updated last week
- DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge☆138Updated last week
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆47Updated last month
- ☆71Updated 8 months ago
- Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223☆137Updated 2 months ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆130Updated 9 months ago
- ☆26Updated 3 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆60Updated 10 months ago
- Evaluate Multimodal LLMs as Embodied Agents☆52Updated 5 months ago
- WorldVLA: Towards Autoregressive Action World Model☆310Updated last month
- Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆70Updated last week
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆97Updated last month
- ☆23Updated last week
- [ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos☆120Updated 2 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆57Updated 10 months ago