THUDM / GLM-4.1V-ThinkingLinks

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.

☆908

Alternatives and similar repositories for GLM-4.1V-Thinking

Users that are interested in GLM-4.1V-Thinking are comparing it to the libraries listed below

Sorting:

ByteDance-Seed / Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,351Updated last month
MoonshotAI / Kimi-VL
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
☆1,014Updated 2 weeks ago
XiaomiMiMo / MiMo-VL
MiMo-VL
☆469Updated last week
Fancy-MLLM / R1-Onevision
R1-onevision, a visual language model capable of deep CoT reasoning.
☆549Updated 3 months ago
ByteDance-Seed / Seed-Thinking-v1.5
☆800Updated last month
turningpoint-ai / VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
☆604Updated 4 months ago
Visual-Agent / DeepEyes
☆679Updated 3 weeks ago
Kwai-Keye / Keye
☆484Updated last week
VectorSpaceLab / Video-XL
🔥🔥First-ever hour scale video understanding models
☆506Updated 2 weeks ago
Alibaba-NLP / VRAG
Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforce…
☆287Updated last month
AIDC-AI / Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
☆999Updated last month
VectorSpaceLab / MegaPairs
[ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
☆209Updated 2 months ago
EvolvingLMMs-Lab / open-r1-multimodal
A fork to add multimodal model training to open-r1
☆1,346Updated 5 months ago
Osilly / Vision-R1
This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-sta…
☆652Updated 2 weeks ago
ModalMinds / MM-EUREKA
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆712Updated this week
HumanMLLM / R1-Omni
☆919Updated 4 months ago
zhangfaen / finetune-Qwen2-VL
☆366Updated 5 months ago
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆642Updated this week
Gen-Verse / MMaDA
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
☆1,254Updated last month
RLHF-V / RLAIF-V
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
☆392Updated 2 months ago
TideDra / lmm-r1
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆796Updated 2 months ago
bytedance / tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆434Updated 3 months ago
QwenLM / Qwen3-Embedding
☆1,123Updated 2 weeks ago
bytedance / Valley
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
☆244Updated 5 months ago
ByteDance-Seed / VideoWorld
[CVPR 2025] VideoWorld is a simple generative model that learns purely from unlabeled videos—much like how babies learn by observing thei…
☆605Updated last week
infinigence / Infini-Megrez-Omni
☆235Updated 5 months ago
HITsz-TMG / Awesome-Large-Multimodal-Reasoning-Models
The development and future prospects of multimodal reasoning models.
☆441Updated 2 weeks ago
Qihoo360 / Light-R1
☆734Updated 2 months ago
OpenBMB / MiniCPM-CookBook
This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…
☆263Updated last month
Alibaba-NLP / OmniSearch
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
☆354Updated 3 months ago