hzxie / DynamicVLALinks
The official implementation of "DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation". (arXiv 2601.22153)
☆69Updated this week
Alternatives and similar repositories for DynamicVLA
Users that are interested in DynamicVLA are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆125Updated 2 months ago
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World"☆124Updated last month
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 3 months ago
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆47Updated 4 months ago
- VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning☆121Updated 3 months ago
- ☆178Updated last week
- Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer☆28Updated 2 months ago
- The official implementation of Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight☆75Updated 2 weeks ago
- The offical repo for paper "VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers" (ICCV 2025)☆108Updated 2 months ago
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆34Updated this week
- Codebase for paper "Geometry-aware 4D Video Generation for Robot Manipulation"☆71Updated 3 weeks ago
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models☆81Updated 2 weeks ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆87Updated 7 months ago
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆71Updated 3 months ago
- [ICLR’26] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control☆94Updated 6 months ago
- StereoVLA is powered by stereo vision and supports flexible deployment with high tolerance to camera pose variations.☆48Updated 2 weeks ago
- Unifying 2D and 3D Vision-Language Understanding☆121Updated 6 months ago
- SPAgent, a spatial intelligence agent designed to operate in the physical and spatial world.☆85Updated this week
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆79Updated last week
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆33Updated last month
- [NIPS 2025] FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens☆19Updated 3 months ago
- [ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation☆172Updated 7 months ago
- [NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.☆219Updated 3 months ago
- Official implementation for BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation☆102Updated 6 months ago
- Official code for "Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation"☆121Updated 5 months ago
- Official Implementation of Paper: WMPO: World Model-based Policy Optimization for Vision-Language-Action Models☆137Updated 3 weeks ago
- [NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆70Updated 4 months ago
- [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning☆43Updated last year
- Official implementation of Spatial-Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model☆170Updated 3 weeks ago
- EO: Open-source Unified Embodied Foundation Model Series☆44Updated 2 weeks ago