2U1 / Qwen2-VL-FinetuneLinks

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

☆1,221

Alternatives and similar repositories for Qwen2-VL-Finetune

Users that are interested in Qwen2-VL-Finetune are comparing it to the libraries listed below

Sorting:

TinyLLaVA / TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
☆905Updated 5 months ago
EvolvingLMMs-Lab / open-r1-multimodal
A fork to add multimodal model training to open-r1
☆1,401Updated 8 months ago
open-compass / VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆3,127Updated last week
zhangfaen / finetune-Qwen2-VL
☆371Updated 8 months ago
Osilly / Vision-R1
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages …
☆706Updated last month
yaotingwangofficial / Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆840Updated last month
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆707Updated 3 weeks ago
ModalMinds / MM-EUREKA
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆735Updated last month
gokayfem / awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
☆1,027Updated 7 months ago
TideDra / lmm-r1
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆820Updated 4 months ago
daixiangzi / Awesome-Token-Compress
A paper list of some recent works about Token Compress for Vit and VLM
☆679Updated 3 weeks ago
zjysteven / lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…
☆337Updated 7 months ago
showlab / Show-o
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,721Updated last week
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,194Updated last week
Fancy-MLLM / R1-Onevision
R1-onevision, a visual language model capable of deep CoT reasoning.
☆568Updated 5 months ago
BAAI-DCAI / Bunny
A family of lightweight multimodal models.
☆1,044Updated 10 months ago
yfzhang114 / Awesome-Multimodal-Large-Language-Models
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
☆661Updated 3 weeks ago
DAMO-NLP-SG / VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
☆1,226Updated 8 months ago
showlab / Awesome-MLLM-Hallucination
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
☆859Updated last week
OpenGVLab / VisionLLM
VisionLLM Series
☆1,110Updated 7 months ago
microsoft / LLM2CLIP
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
☆553Updated 3 months ago
DAMO-NLP-SG / VideoLLaMA3
Frontier Multimodal Foundation Models for Image and Video Understanding
☆1,000Updated last month
zhaochen0110 / Awesome_Think_With_Images
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,000Updated this week
LLaVA-VL / LLaVA-NeXT
☆4,288Updated 3 weeks ago
hiyouga / EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆3,708Updated this week
TIGER-AI-Lab / VLM2Vec
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
☆421Updated 3 weeks ago
Visual-Agent / DeepEyes
☆851Updated last month
PKU-YuanGroup / LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
☆830Updated last year
AIDC-AI / Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
☆1,364Updated 2 weeks ago
dvlab-research / Seg-Zero
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
☆524Updated 2 months ago