OpenHelix-Team / CEED-VLALinks
Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.
☆25Updated 3 weeks ago
Alternatives and similar repositories for CEED-VLA
Users that are interested in CEED-VLA are comparing it to the libraries listed below
Sorting:
- ☆37Updated last month
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆75Updated last month
- PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability☆18Updated 3 months ago
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆23Updated last month
- [CVPR 2025 Highlight] Towards Autonomous Micromobility through Scalable Urban Simulation☆81Updated this week
- DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge☆71Updated this week
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆68Updated 2 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆29Updated last week
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆133Updated last month
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆64Updated last month
- 3DGraphLLM is a model that uses a 3D scene graph and an LLM to perform 3D vision-language tasks.☆66Updated 2 months ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆110Updated 4 months ago
- ☆45Updated 2 months ago
- ☆55Updated this week
- Improving 3D Large Language Model via Robust Instruction Tuning☆60Updated 4 months ago
- Unified Vision-Language-Action Model☆128Updated 2 weeks ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding☆109Updated last month
- A paper list of world model☆28Updated 3 months ago
- ☆33Updated last year
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO