zai-org / GLM-VLinks
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
☆1,599Updated 2 weeks ago
Alternatives and similar repositories for GLM-V
Users that are interested in GLM-V are comparing it to the libraries listed below
Sorting:
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,416Updated 2 months ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆1,053Updated last month
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,321Updated 3 weeks ago
- MiMo-VL☆538Updated 3 weeks ago
- ☆939Updated 5 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆565Updated 4 months ago
- Next-Token Prediction is All You Need☆2,192Updated 5 months ago
- A fork to add multimodal model training to open-r1☆1,387Updated 7 months ago
- ☆791Updated last week
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,598Updated 2 months ago
- ☆958Updated this week
- Explore the Multimodal “Aha Moment” on 2B Model☆607Updated 5 months ago
- ☆812Updated 3 months ago
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,341Updated 3 weeks ago
- ☆1,331Updated last month
- ☆867Updated last week
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆733Updated this week
- ☆615Updated last week
- ZeroSearch: Incentivize the Search Capability of LLMs without Searching☆1,125Updated 3 weeks ago
- This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages …☆688Updated this week
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆818Updated 3 months ago
- ☆741Updated last week
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆3,552Updated last week
- Frontier Multimodal Foundation Models for Image and Video Understanding☆970Updated 3 weeks ago
- 🔥🔥First-ever hour scale video understanding models☆538Updated last month
- Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.☆549Updated 3 months ago
- Muon is Scalable for LLM Training☆1,302Updated last month
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆685Updated this week
- ☆368Updated 7 months ago
- Official implementation of BLIP3o-Series☆1,459Updated last week