GigaAI-research / SwiftVLALinks
☆53Updated last month
Alternatives and similar repositories for SwiftVLA
Users that are interested in SwiftVLA are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding☆69Updated last year
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding☆128Updated 8 months ago
- [ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation☆172Updated 7 months ago
- 4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration. Accepted to NeurIPS 2025.☆47Updated 3 weeks ago
- [NeurIPS 2025]Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency☆75Updated 4 months ago
- Official implementation of "From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction"☆56Updated 2 months ago
- [CVPR2025] CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos☆184Updated 4 months ago
- [ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes☆128Updated 10 months ago
- [AAAI 2026] WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving☆25Updated last month
- ☆61Updated 7 months ago
- [CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding☆205Updated 9 months ago
- Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding.☆52Updated 2 months ago
- [ECCV 2024] Monocular Occupancy Prediction for Scalable Indoor Scenes☆66Updated last year
- Official implementation of "Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation" (NeurIPS'25 Oral)☆71Updated last month
- [CVPR 2024] Memory-based Adapters for Online 3D Scene Perception☆125Updated 10 months ago
- official code of *DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model*☆61Updated last year
- Project Page for GaussianFormer☆24Updated last year
- [ICCV 2025] IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation☆63Updated 5 months ago
- ☆137Updated last month
- [ICCV 2025] Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model☆96Updated last year
- Official Github Repo for GEM☆101Updated 3 months ago
- Official implementation of “4D LangVGGT: 4D Language-Visual Geometry Grounded Transformer”☆76Updated last month
- ☆54Updated last year
- ☆227Updated 5 months ago
- Nav-R1: Reasoning and Navigation in Embodied Scenes☆108Updated 3 months ago
- [ICCV 2025] Detect Anything 3D in the Wild☆245Updated last month
- Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer☆28Updated 2 months ago
- [CVPR 2025] GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding☆203Updated 3 weeks ago
- [ECCV24] Navigation Instruction Generation with BEV Perception and Large Language Models☆30Updated last year
- [NeurIPS 2025] DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge☆282Updated 3 weeks ago