Darren-greenhand / LLaVA_OpenVLALinks

Converted the training data of OpenVLA into general form of multimodal training instructions and then used with LLaVA-OneVision

☆19

Alternatives and similar repositories for LLaVA_OpenVLA

Users that are interested in LLaVA_OpenVLA are comparing it to the libraries listed below

Sorting:

tanhuajie / Reason-RFT
⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.
☆174Updated last month
nicehiro / Awesome-Vision-Language-Action-Models
☆12Updated 2 weeks ago
CSU-JPG / Awesome-VLM-Reasoning
☆15Updated 2 months ago
ding523 / Curr_REFT
☆64Updated 2 months ago
FlagOpen / ShareRobot
☆44Updated 3 months ago
lmzpai / roboMamba
The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`
☆129Updated 7 months ago
linkangheng / PR1
Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆222Updated last week
AI-Study-Han / Zero-Qwen-VL
训练一个对中文支持更好的LLaVA模型，并开源训练代码和数据。
☆64Updated 10 months ago
zhangfaen / finetune-Qwen2.5-VL
☆53Updated 5 months ago
RethinkFun / trian_ppo
☆90Updated 9 months ago
RoyZry98 / MoLe-VLA-Pytorch
[Arxiv 2025: MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation]
☆40Updated 3 months ago
zwq2018 / embodied_reasoner
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
☆146Updated last month
niejnan / OpenVLA
多模态具身智能大模型 OpenVLA 的复现以及在 LIBERO 数据集上的微调改进
☆145Updated 4 months ago
tulerfeng / Awesome-Embodied-Multimodal-LLMs
Latest Advances on Embodied Multimodal LLMs (or Vison-Language-Action Models).
☆116Updated last year
arctanxarc / MC-LLaVA
Official implementation of MC-LLaVA.
☆32Updated last month
Kwai-YuanQi / MM-RLHF
The Next Step Forward in Multimodal LLM Alignment
☆170Updated 2 months ago
Xuange923 / HiMix
Official project page of "HiMix: Reducing Computational Complexity in Large Vision-Language Models"
☆13Updated 5 months ago
FlagOpen / RoboBrain
[CVPR 2025] RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. Official Repository.
☆271Updated last month
xmu-xiaoma666 / Multimodal-Open-O1
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…
☆29Updated 9 months ago
EmbodiedGPT / EmbodiedGPT_Pytorch
☆340Updated last year
ZhangXJ199 / TinyLLaVA-Video-R1
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
☆83Updated last month
ZhangXJ199 / TinyLLaVA-Video
A Simple Framework of Small-scale LMMs for Video Understanding
☆72Updated last month
IranQin / MP5
[CVPR2024] This is the official implement of MP5
☆103Updated last year
yueyang130 / DeeR-VLA
Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"
☆99Updated 5 months ago
OpenHelix-Team / CEED-VLA
Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.
☆28Updated 3 weeks ago
czhhzc / CoMT
code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"
☆19Updated 4 months ago
waltonfuture / RL-with-Cold-Start
SFT+RL boosts multimodal reasoning
☆19Updated 3 weeks ago
Theia-4869 / FasterVLM
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆83Updated 3 weeks ago
BAAI-DCAI / SpatialBot
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
☆282Updated last month
linjh1118 / Awesome-MLLM-For-Games
MLLM @ Game
☆14Updated 2 months ago