UMass-Embodied-AGI/3D-VLA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/UMass-Embodied-AGI/3D-VLA)

UMass-Embodied-AGI / 3D-VLA

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

☆629

Alternatives and similar repositories for 3D-VLA

Users that are interested in 3D-VLA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

octo-models / octo
View on GitHub
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
☆1,721Jul 31, 2024Updated last year
UMass-Embodied-AGI / MultiPLY
View on GitHub
Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
☆135Oct 24, 2024Updated last year
simpler-env / SimplerEnv
View on GitHub
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Goo…
☆1,129Dec 20, 2025Updated 7 months ago
UMass-Embodied-AGI / 3D-LLM
View on GitHub
Code for 3D-LLM: Injecting the 3D World into Large Language Models
☆1,211Jun 6, 2024Updated 2 years ago
embodied-generalist / embodied-generalist
View on GitHub
[ICML 2024] LEO: An Embodied Generalist Agent in 3D World
☆487Apr 20, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zhouxian / act3d-chained-diffuser
View on GitHub
A unified architecture for multimodal multi-task robotic policy learning.
☆185Feb 2, 2024Updated 2 years ago
LatentActionPretraining / LAPA
View on GitHub
[ICLR 2025] LAPA: Latent Action Pretraining from Videos
☆561Jan 22, 2025Updated last year
Robot-VLAs / RoboVLMs
View on GitHub
☆475Apr 14, 2026Updated 3 months ago
SpatialVLA / SpatialVLA
View on GitHub
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
☆710Jun 23, 2025Updated last year
OpenDriveLab / UniVLA
View on GitHub
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
☆1,114Nov 19, 2025Updated 8 months ago
YanjieZe / 3D-Diffusion-Policy
View on GitHub
[RSS 2024] 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
☆1,415Oct 17, 2025Updated 9 months ago
flow-diffusion / AVDC
View on GitHub
Official repository of Learning to Act from Actionless Videos through Dense Correspondences.
☆262Apr 25, 2024Updated 2 years ago
thu-ml / RoboticsDiffusionTransformer
View on GitHub
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
☆1,764Jan 21, 2026Updated 6 months ago
Large-Trajectory-Model / ATM
View on GitHub
Official codebase for "Any-point Trajectory Modeling for Policy Learning"
☆278Jun 19, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
nickgkan / 3d_diffuser_actor
View on GitHub
Code for the paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"
☆392Aug 17, 2024Updated last year
bytedance / GR-1
View on GitHub
Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
☆310Apr 22, 2024Updated 2 years ago
rainbow979 / robodreamer
View on GitHub
☆102Sep 4, 2024Updated last year
ShuangLI59 / unified_video_action
View on GitHub
Official PyTorch Implementation of Unified Video Action Model (RSS 2025)
☆400Jul 23, 2025Updated last year
InternRobotics / EmbodiedScan
View on GitHub
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
☆672Jun 13, 2025Updated last year
huangwl18 / VoxPoser
View on GitHub
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
☆826Feb 20, 2025Updated last year
huangwl18 / ReKep
View on GitHub
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
☆976Feb 20, 2025Updated last year
openvla / openvla
View on GitHub
OpenVLA: An open-source vision-language-action model for robotic manipulation.
☆6,719Mar 23, 2025Updated last year
GuanxingLu / ManiGaussian
View on GitHub
[ECCV 2024] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
☆281Mar 29, 2026Updated 3 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
microsoft / CogACT
View on GitHub
A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
☆430Oct 30, 2025Updated 8 months ago
zubair-irshad / Awesome-Robotics-3D
View on GitHub
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vi…
☆818Dec 17, 2025Updated 7 months ago
moojink / openvla-oft
View on GitHub
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
☆1,317Sep 9, 2025Updated 10 months ago
PKU-HMI-Lab / Hybrid-VLA
View on GitHub
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
☆352Oct 3, 2025Updated 9 months ago
PKU-HMI-Lab / LIFT3D
View on GitHub
[CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
☆185Jun 20, 2025Updated last year
UMass-Embodied-AGI / TesserAct
View on GitHub
ICCV 2025 | TesserAct: Learning 4D Embodied World Models
☆403Aug 4, 2025Updated 11 months ago
SiyuanHuang95 / ManipVQA
View on GitHub
[IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models
☆102Aug 22, 2024Updated last year
OpenMOSS / VLABench
View on GitHub
Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and VLMs.
☆454Nov 11, 2025Updated 8 months ago
OpenDriveLab / AgiBot-World
View on GitHub
[IROS 2025 Best Paper Award Finalist & IEEE TRO 2026] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
☆3,107May 29, 2026Updated 2 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
dyson-ai / hdp
View on GitHub
[CVPR 2024] Hierarchical Diffusion Policy for Multi-Task Robotic Manipulation
☆238Apr 9, 2024Updated 2 years ago
allenzren / open-pi-zero
View on GitHub
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
☆1,507Jan 31, 2025Updated last year
BAAI-DCAI / SpatialBot
View on GitHub
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
☆349Updated this week
changhaonan / A3VLM
View on GitHub
[CoRL2024] Official repo of `A3VLM: Actionable Articulation-Aware Vision Language Model`
☆122Oct 7, 2024Updated last year
robocasa / robocasa
View on GitHub
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
☆1,585Jul 8, 2026Updated 3 weeks ago
real-stanford / im2Flow2Act
View on GitHub
[CoRL 2024] Im2Flow2Act: Flow as the Cross-domain Manipulation Interface
☆161Oct 17, 2024Updated last year
PRIME-RL / SimpleVLA-RL
View on GitHub
[ICLR 2026] SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
☆1,798Jan 6, 2026Updated 6 months ago