XiaomiMiMo / MiMo-VLLinks
☆378Updated 2 weeks ago
Alternatives and similar repositories for MiMo-VL
Users that are interested in MiMo-VL are comparing it to the libraries listed below
Sorting:
- Explore the Multimodal “Aha Moment” on 2B Model☆592Updated 3 months ago
- ☆504Updated this week
- ☆161Updated 4 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆528Updated 2 months ago
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆149Updated last month
- Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources☆226Updated last month
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆569Updated 3 weeks ago
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆654Updated 3 weeks ago
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆302Updated last month
- Long Context Transfer from Language to Vision☆381Updated 3 months ago
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆328Updated last week
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆267Updated 4 months ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆893Updated 2 months ago
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆159Updated 3 months ago
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆378Updated last month
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆348Updated 4 months ago
- ☆269Updated 3 weeks ago
- This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-sta…☆607Updated last week
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆204Updated 5 months ago
- The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning☆263Updated 3 weeks ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆202Updated 2 months ago
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,247Updated last week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆130Updated 2 months ago
- Official implementation of UnifiedReward & UnifiedReward-Think☆417Updated last week
- ☆219Updated 3 weeks ago
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,109Updated last week
- Visual Planning: Let's Think Only with Images☆224Updated last month
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆64Updated last month
- 🔥🔥First-ever hour scale video understanding models☆437Updated 2 weeks ago
- ☆789Updated 2 weeks ago