fudan-zvg / 4D-VLALinks

4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration

☆38

Alternatives and similar repositories for 4D-VLA

Users that are interested in 4D-VLA are comparing it to the libraries listed below

Sorting:

InternRobotics / VLM-Grounder
[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
☆115Updated 4 months ago
HaoyiZhu / SPA
[ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
☆167Updated 4 months ago
GigaAI-research / VLA-R1
☆19Updated this week
antonioo-c / GeoDrive
☆54Updated 4 months ago
YkiWu / EmbodiedOcc
[ICCV 2025] Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding
☆61Updated 9 months ago
PKU-HMI-Lab / LIFT3D
[CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
☆163Updated 4 months ago
LiAutoAD / DIVE
☆28Updated last year
wzzheng / Stag
[ICCV 2025] Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model
☆83Updated 10 months ago
OpenHelix-Team / ReconVLA
Official implementation of ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver.
☆58Updated 3 weeks ago
wzzheng / GaussianFormer
Project Page for GaussianFormer
☆24Updated last year
OpenDriveLab / DetAny3D
[ICCV 2025] Detect Anything 3D in the Wild
☆209Updated 3 months ago
xuxw98 / Online3D
[CVPR 2024] Memory-based Adapters for Online 3D Scene Perception
☆120Updated 6 months ago
Lizhuoling / UniMODE
☆44Updated 5 months ago
OpenHelix-Team / Spatial-Forcing
Official implementation of Spatial-Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
☆46Updated this week
xiaobiaodu / DreamCar
[RA-L 2024] DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction
☆81Updated last year
iris0329 / SeeGround
[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
☆178Updated 5 months ago
xiaomi-research / genesis
[NeurIPS 2025]Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency
☆62Updated last month
HaoyiZhu / RealRobot
Open-source implementations on real robots
☆34Updated 10 months ago
Zhoues / RoboRefer
[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
☆173Updated 2 weeks ago
m2diffuser / M2Diffuser
Official implementation of T-PAMI25 paper "M²Diffuser: Diffusion-based Trajectory Optimization for Mobile Manipulation in 3D Scenes"
☆91Updated 4 months ago
MINT-SJTU / Evo-VLA
Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding.
☆37Updated 3 months ago
jxbbb / TOD3Cap
[ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
☆129Updated 7 months ago
hongxiaoy / ISO
[ECCV 2024] Monocular Occupancy Prediction for Scalable Indoor Scenes
☆61Updated last year
Zhangwenyao1 / DreamVLA
[NeurIPS 2025] DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge
☆197Updated last month
GWxuan / IGL-Nav
[ICCV 2025] IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
☆45Updated 2 months ago
gusongen / DOME
official code of *DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model*
☆50Updated 9 months ago
VITA-Group / VLM-3R
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆278Updated last month
AIGeeksGroup / Nav-R1
Nav-R1: Reasoning and Navigation in Embodied Scenes
☆58Updated 2 weeks ago
OpenHelix-Team / LLaVA-VLA
LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintained🔥]
☆161Updated 3 weeks ago
cancaries / SceneCrafter
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving
☆29Updated this week