ustcwhy / BitVLALinks
Official implementation for BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
☆54Updated last week
Alternatives and similar repositories for BitVLA
Users that are interested in BitVLA are comparing it to the libraries listed below
Sorting:
- Nvidia GEAR Lab's initiative to solve the robotics data problem using world models☆205Updated 2 weeks ago
- NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks☆146Updated last week
- Unified Vision-Language-Action Model☆128Updated 2 weeks ago
- WorldVLA: Towards Autoregressive Action World Model☆268Updated last week
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆64Updated last month
- Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective☆110Updated 2 weeks ago
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆68Updated 2 months ago
- [ICML'25] The PyTorch implementation of paper: "AdaWorld: Learning Adaptable World Models with Latent Actions".☆125Updated last month
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆133Updated last month
- Virtual Community: An Open World for Humans, Robots, and Society☆142Updated 2 weeks ago
- ☆76Updated last month
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆78Updated last month
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆156Updated 2 months ago
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"☆99Updated 5 months ago
- A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation☆33Updated 3 months ago
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy☆214Updated 3 months ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆75Updated last month
- Official Reporsitory of "RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation"☆100Updated last month
- The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`☆128Updated 6 months ago
- Unifying 2D and 3D Vision-Language Understanding☆95Updated 3 months ago
- ☆75Updated 10 months ago
- PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability☆18Updated 3 months ago
- Unfied World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets☆93Updated last month
- Official implementation for the project RUKA: Rethinking the Design of Humanoid Hands with Learning. Project Website: https://ruka-hand.g…☆102Updated this week
- Improving 3D Large Language Model via Robust Instruction Tuning☆60Updated 4 months ago
- AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World☆75Updated last month
- Distributed, scalable benchmarking of generalist robot policies.☆35Updated 3 weeks ago
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks☆146Updated last month
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆96Updated last week
- Official implementation of "OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning"☆147Updated last month