YuZhaoshu / Efficient-VLAs-SurveyLinks
π₯This is a curated list of "A survey on Efficient Vision-Language Action Models" research. We will continue to maintain and update the repository, so follow us to keep up with the latest developments!!!
β110Updated this week
Alternatives and similar repositories for Efficient-VLAs-Survey
Users that are interested in Efficient-VLAs-Survey are comparing it to the libraries listed below
Sorting:
- StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developingβ725Updated this week
- Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspectiveβ395Updated 6 months ago
- HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Modelβ333Updated 3 months ago
- [NeurIPS 2025] VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulationβ58Updated 3 months ago
- InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policyβ330Updated last week
- LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintainedπ₯]β173Updated 2 months ago
- OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulationβ333Updated 4 months ago
- β423Updated 3 weeks ago
- Building General-Purpose Robots Based on Embodied Foundation Modelβ645Updated last month
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"β122Updated 10 months ago
- Running VLA at 30Hz frame rate and 480Hz trajectory frequencyβ338Updated last week
- Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignmentβ197Updated 3 weeks ago
- [NeurIPS 2025] DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledgeβ273Updated this week
- The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`β148Updated last year
- A curated list of large VLM-based VLA models for robotic manipulation.β293Updated 3 weeks ago
- VLA-Arena is an open-source benchmark for systematic evaluation of Vision-Language-Action (VLA) models.β97Updated this week
- Dexbotic: Open-Source Vision-Language-Action Toolboxβ646Updated last week
- Unified Vision-Language-Action Modelβ257Updated 2 months ago
- Official Code For VLA-OS.β131Updated 6 months ago
- RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulationβ275Updated last month
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β326Updated 3 months ago
- π₯ SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.β617Updated 6 months ago
- Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).β122Updated last year
- [NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"β218Updated 3 weeks ago
- Galaxea's first VLA releaseβ471Updated this week
- RynnVLA-002: A Unified Vision-Language-Action and World Modelβ818Updated last month
- Real-Time VLAs via Future-state-aware Asynchronous Inference.β264Updated 3 weeks ago
- Latest Advances on Vison-Language-Action Models.β124Updated 10 months ago
- A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulationβ395Updated 2 months ago
- The offical Implementation of "Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model"β404Updated last week