fudan-zvg / 4D-VLALinks
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
☆38Updated 3 months ago
Alternatives and similar repositories for 4D-VLA
Users that are interested in 4D-VLA are comparing it to the libraries listed below
Sorting:
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding☆115Updated 4 months ago
- [ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation☆167Updated 4 months ago
- ☆19Updated this week
- ☆54Updated 4 months ago
- [ICCV 2025] Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding☆61Updated 9 months ago
- [CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation☆163Updated 4 months ago
- ☆28Updated last year
- [ICCV 2025] Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model☆83Updated 10 months ago
- Official implementation of ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver.☆58Updated 3 weeks ago
- Project Page for GaussianFormer☆24Updated last year
- [ICCV 2025] Detect Anything 3D in the Wild☆209Updated 3 months ago
- [CVPR 2024] Memory-based Adapters for Online 3D Scene Perception☆120Updated 6 months ago
- ☆44Updated 5 months ago
- Official implementation of Spatial-Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model☆46Updated this week
- [RA-L 2024] DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction☆81Updated last year
- [CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding☆178Updated 5 months ago
- [NeurIPS 2025]Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency☆62Updated last month
- Open-source implementations on real robots☆34Updated 10 months ago
- [NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆173Updated 2 weeks ago
- Official implementation of T-PAMI25 paper "M²Diffuser: Diffusion-based Trajectory Optimization for Mobile Manipulation in 3D Scenes"☆91Updated 4 months ago
- Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding.☆37Updated 3 months ago
- [ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes☆129Updated 7 months ago
- [ECCV 2024] Monocular Occupancy Prediction for Scalable Indoor Scenes☆61Updated last year
- [NeurIPS 2025] DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge☆197Updated last month
- [ICCV 2025] IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation☆45Updated 2 months ago
- official code of *DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model*☆50Updated 9 months ago
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction☆278Updated last month
- Nav-R1: Reasoning and Navigation in Embodied Scenes☆58Updated 2 weeks ago
- LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintained🔥]☆161Updated 3 weeks ago
- Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving☆29Updated this week