Darren-greenhand / LLaVA_OpenVLALinks
Converted the training data of OpenVLA into general form of multimodal training instructions and then used with LLaVA-OneVision
☆19Updated 6 months ago
Alternatives and similar repositories for LLaVA_OpenVLA
Users that are interested in LLaVA_OpenVLA are comparing it to the libraries listed below
Sorting:
- ⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆174Updated last month
- ☆12Updated 2 weeks ago
- ☆15Updated 2 months ago
- ☆64Updated 2 months ago
- ☆44Updated 3 months ago
- The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`☆129Updated 7 months ago
- Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning☆222Updated last week
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆64Updated 10 months ago
- ☆53Updated 5 months ago
- ☆90Updated 9 months ago
- [Arxiv 2025: MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation]☆40Updated 3 months ago
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks☆146Updated last month
- 多模态具身智能大模型 OpenVLA 的复现以及在 LIBERO 数据集上的微调改进☆145Updated 4 months ago
- Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).☆116Updated last year
- Official implementation of MC-LLaVA.☆32Updated last month
- The Next Step Forward in Multimodal LLM Alignment☆170Updated 2 months ago
- Official project page of "HiMix: Reducing Computational Complexity in Large Vision-Language Models"☆13Updated 5 months ago
- [CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.☆271Updated last month
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 9 months ago
- ☆340Updated last year
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆83Updated last month
- A Simple Framework of Small-scale LMMs for Video Understanding☆72Updated last month
- [CVPR2024] This is the official implement of MP5☆103Updated last year
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"☆99Updated 5 months ago
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆28Updated 3 weeks ago
- code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"☆19Updated 4 months ago
- SFT+RL boosts multimodal reasoning☆19Updated 3 weeks ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆83Updated 3 weeks ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.☆282Updated last month
- MLLM @ Game☆14Updated 2 months ago