QwenLM/Qwen3-VL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/QwenLM/Qwen3-VL)

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

☆19,694

Alternatives and similar repositories for Qwen3-VL

Users that are interested in Qwen3-VL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,111Sep 22, 2025Updated 10 months ago
QwenLM / Qwen3
View on GitHub
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
☆27,451Jan 9, 2026Updated 6 months ago
modelscope / ms-swift
View on GitHub
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…
☆15,009Updated this week
QwenLM / Qwen-VL
View on GitHub
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
☆6,718Aug 7, 2024Updated last year
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,956Aug 12, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,713Jun 15, 2026Updated last month
hiyouga / LlamaFactory
View on GitHub
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
☆73,625Updated this week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,737Updated this week
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,311Jul 22, 2026Updated last week
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,018Jul 7, 2026Updated 3 weeks ago
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,092Updated this week
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆87,645Updated this week
ByteDance-Seed / Bagel
View on GitHub
Open-source unified multimodal model
☆6,131May 4, 2026Updated 2 months ago
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,962Updated this week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
openai / CLIP
View on GitHub
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
☆34,100Mar 25, 2026Updated 4 months ago
QwenLM / Qwen-Agent
View on GitHub
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
☆16,879Mar 4, 2026Updated 4 months ago
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆26,059Jul 23, 2026Updated last week
facebookresearch / dinov3
View on GitHub
Reference PyTorch implementation and models for DINOv3
☆11,063Jul 15, 2026Updated 2 weeks ago
facebookresearch / sam2
View on GitHub
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…
☆19,630May 30, 2026Updated 2 months ago
QwenLM / Qwen-Image
View on GitHub
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
☆8,184Feb 10, 2026Updated 5 months ago
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,584Jun 14, 2025Updated last year
QwenLM / Qwen2.5-Omni
View on GitHub
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…
☆4,056Jun 12, 2025Updated last year
2U1 / Qwen-VL-Series-Finetune
View on GitHub
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
☆1,947Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
StarsfieldAI / R1-V
View on GitHub
Witness the aha moment of VLM with less than $3.
☆4,065May 19, 2025Updated last year
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,580Updated this week
QwenLM / Qwen3-Omni
View on GitHub
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆3,922Apr 23, 2026Updated 3 months ago
facebookresearch / dinov2
View on GitHub
PyTorch code and models for the DINOv2 self-supervised learning method.
☆13,176Jun 3, 2026Updated last month
deepseek-ai / Janus
View on GitHub
Janus-Series: Unified Multimodal Understanding and Generation Models
☆17,758Feb 1, 2025Updated last year
IDEA-Research / GroundingDINO
View on GitHub
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
☆10,459Aug 12, 2024Updated last year
Wan-Video / Wan2.1
View on GitHub
Wan: Open and Advanced Large-Scale Video Generative Models
☆16,696Mar 5, 2026Updated 4 months ago
facebookresearch / segment-anything
View on GitHub
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…
☆54,625Sep 18, 2024Updated last year
mlfoundations / open_clip
View on GitHub
An open source implementation of CLIP.
☆14,034Jul 17, 2026Updated last week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
QwenLM / Qwen
View on GitHub
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
☆21,500Mar 5, 2026Updated 4 months ago
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Kimi K3, Gemma 4, Qwen3.6, DeepSeek, GLM and other models.
☆69,203Updated this week
zai-org / GLM-V
View on GitHub
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
☆2,363Jul 21, 2026Updated last week
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,259Jun 2, 2026Updated last month
EvolvingLMMs-Lab / lmms-eval
View on GitHub
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆4,339Updated this week
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,970Updated this week
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,263Oct 29, 2025Updated 9 months ago