zai-org / GLM-VLinks

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

☆1,710

Alternatives and similar repositories for GLM-V

Users that are interested in GLM-V are comparing it to the libraries listed below

Sorting:

MoonshotAI / Kimi-VL
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
☆1,071Updated 3 months ago
ByteDance-Seed / Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,463Updated 4 months ago
HumanMLLM / R1-Omni
☆956Updated 6 months ago
XiaomiMiMo / MiMo-VL
MiMo-VL
☆570Updated 2 months ago
AIDC-AI / Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
☆1,373Updated last month
QwenLM / Qwen3-Embedding
☆1,498Updated 3 weeks ago
Fancy-MLLM / R1-Onevision
R1-onevision, a visual language model capable of deep CoT reasoning.
☆569Updated 6 months ago
Kwai-Keye / Keye
☆681Updated 3 weeks ago
ByteDance-Seed / m3-agent
☆1,033Updated last week
EvolvingLMMs-Lab / open-r1-multimodal
A fork to add multimodal model training to open-r1
☆1,409Updated 8 months ago
meituan-longcat / LongCat-Flash-Chat
☆1,160Updated this week
baaivision / Emu3
Next-Token Prediction is All You Need
☆2,208Updated 7 months ago
QwenLM / Qwen2.5-Omni
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…
☆3,728Updated 4 months ago
DAMO-NLP-SG / VideoLLaMA3
Frontier Multimodal Foundation Models for Image and Video Understanding
☆1,011Updated 2 months ago
Visual-Agent / DeepEyes
☆868Updated last month
ByteDance-Seed / Seed-Thinking-v1.5
☆817Updated 4 months ago
turningpoint-ai / VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
☆613Updated 7 months ago
VectorSpaceLab / Video-XL
🔥🔥First-ever hour scale video understanding models
☆557Updated 3 months ago
ModalMinds / MM-EUREKA
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆751Updated last month
Gen-Verse / MMaDA
[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
☆1,434Updated last week
Alibaba-NLP / ZeroSearch
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
☆1,167Updated 2 months ago
hiyouga / EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆3,836Updated this week
Osilly / Vision-R1
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages …
☆717Updated last month
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆718Updated last month
MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆1,336Updated 2 months ago
QwenLM / Qwen3-Omni
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆2,699Updated last week
TideDra / lmm-r1
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆825Updated 5 months ago
NVlabs / describe-anything
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
☆1,365Updated 3 months ago
BytedTsinghua-SIA / DAPO
An Open-source RL System from ByteDance Seed and Tsinghua AIR
☆1,597Updated 5 months ago
PKU-YuanGroup / LLaVA-CoT
[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
☆2,084Updated this week