RUCAIBox / VirgoLinks

Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*

☆109

Alternatives and similar repositories for Virgo

Users that are interested in Virgo are comparing it to the libraries listed below

Sorting:

RifleZhang / LLaVA-Reasoner-DPO
☆102Updated 10 months ago
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆87Updated 10 months ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆166Updated 5 months ago
NUS-TRAIL / NoisyRollout
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆98Updated 2 months ago
OpenGVLab / V2PE
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆58Updated 11 months ago
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆117Updated last year
UCSC-VLAA / VLAA-Thinking
[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆143Updated last month
MikeWangWZHL / PAPO
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆102Updated 3 months ago
chenllliang / G1
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆89Updated 6 months ago
mathllm / MATH-V
[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.
☆123Updated 6 months ago
DAMO-NLP-SG / multimodal_textbook
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆178Updated 8 months ago
zwq2018 / Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…
☆84Updated 10 months ago
TideDra / VL-RLHF
A RLHF Infrastructure for Vision-Language Models
☆187Updated last year
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆98Updated last year
MAmmoTH-VL / MAmmoTH-VL
(ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
☆48Updated 5 months ago
LengSicong / MMR1
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
☆210Updated 2 months ago
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆87Updated 3 months ago
Open-Reasoner-Zero / Open-Vision-Reasoner
[NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…
☆144Updated 2 months ago
vlf-silkie / VLFeedback
☆100Updated last year
HZQ950419 / Math-LLaVA
Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
☆92Updated last year
si0wang / ThinkLite-VL
☆105Updated 5 months ago
thunlp / Muffin
☆66Updated last year
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆70Updated last year
facebookresearch / multimodal_rewardbench
Multimodal RewardBench
☆55Updated 9 months ago
EvolvingLMMs-Lab / OpenMMReasoner
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
☆111Updated this week
GAIR-NLP / MAYE
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆145Updated 7 months ago
ShadeCloak / ADORA
☆46Updated 7 months ago
Kun-Xiang / AtomThink
Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"
☆57Updated 2 weeks ago
EvolvingLMMs-Lab / multimodal-search-r1
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…
☆359Updated 3 months ago
MMStar-Benchmark / MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆201Updated last year