UCSC-VLAA/VLAA-Thinking

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/UCSC-VLAA/VLAA-Thinking)

UCSC-VLAA / VLAA-Thinking

[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

☆148

Alternatives and similar repositories for VLAA-Thinking

Users that are interested in VLAA-Thinking are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UCSC-VLAA / ReasoningEval
View on GitHub
Official repo of Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains.
☆43Jun 6, 2025Updated last year
TIGER-AI-Lab / VL-Rethinker
View on GitHub
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆190Jun 5, 2025Updated last year
UCSC-VLAA / VLAA-GUI
View on GitHub
Official implementation of VLAA-GUI series
☆34Jun 20, 2026Updated last month
haojinw0027 / MedFrameQA
View on GitHub
MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
☆18Jun 6, 2025Updated last year
UCSC-VLAA / EarthWhere
View on GitHub
☆16Nov 15, 2025Updated 8 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
UCSC-VLAA / STAR-1
View on GitHub
[AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data
☆38Apr 7, 2025Updated last year
WangRongsheng / Med-R1
View on GitHub
Encourage Medical LLM to engage in deep thinking similar to DeepSeek-R1.
☆26Apr 24, 2025Updated last year
ModalMinds / MM-PRM
View on GitHub
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
☆30May 26, 2025Updated last year
UCSC-VLAA / CLIPS
View on GitHub
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆40Apr 18, 2025Updated last year
turningpoint-ai / VisualThinker-R1-Zero
View on GitHub
Explore the Multimodal “Aha Moment” on 2B Model
☆624Mar 18, 2025Updated last year
uclanlp / OpenVLThinker
View on GitHub
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆155May 25, 2026Updated 2 months ago
ModalMinds / MM-EUREKA
View on GitHub
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆770Sep 7, 2025Updated 10 months ago
real-absolute-AI / NoisyRollout
View on GitHub
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆112Sep 18, 2025Updated 10 months ago
ImKeTT / ZeroGen
View on GitHub
[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation
☆14Oct 7, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
UCSC-VLAA / vllm-safety-benchmark
View on GitHub
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆90Nov 28, 2023Updated 2 years ago
UCSC-VLAA / m1
View on GitHub
[ML4H'25] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
☆51Dec 21, 2025Updated 7 months ago
UCSC-VLAA / CIK-Bench
View on GitHub
Official repository for Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
☆69May 2, 2026Updated 2 months ago
Osilly / Vision-R1
View on GitHub
[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that…
☆1,584Mar 20, 2026Updated 4 months ago
UCSC-VLAA / Recap-DataComp-1B
View on GitHub
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆152Jun 13, 2024Updated 2 years ago
Letian2003 / MM_INF
View on GitHub
An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…
☆40Jun 4, 2025Updated last year
minglllli / CLS-RL
View on GitHub
[NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
☆90Sep 19, 2025Updated 10 months ago
UCSC-VLAA / o1_medical
View on GitHub
☆48Feb 26, 2025Updated last year
UCSC-VLAA / MedVLThinker
View on GitHub
[ML4H'25] MedVLThinker: Simple Baselines for Multimodal Medical Reasoning
☆59Dec 21, 2025Updated 7 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
OliverRensu / MVG
View on GitHub
☆61Jun 18, 2024Updated 2 years ago
TideDra / lmm-r1
View on GitHub
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆848May 14, 2025Updated last year
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,013Jul 7, 2026Updated 2 weeks ago
StarsfieldAI / R1-V
View on GitHub
Witness the aha moment of VLM with less than $3.
☆4,065May 19, 2025Updated last year
UCSC-VLAA / AttnGCG-attack
View on GitHub
[TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
☆27Jun 17, 2025Updated last year
si0wang / ThinkLite-VL
View on GitHub
☆105Jun 10, 2025Updated last year
EvolvingLMMs-Lab / open-r1-multimodal
View on GitHub
A fork to add multimodal model training to open-r1
☆1,593Feb 8, 2025Updated last year
XenoZLH / Shuffle-R1
View on GitHub
Official code repository of Shuffle-R1
☆26Feb 23, 2026Updated 5 months ago
FreedomIntelligence / FastLLM
View on GitHub
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆41Jan 4, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,081Updated this week
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,268Aug 27, 2025Updated 10 months ago
UCSC-VLAA / ClinSeekAgent
View on GitHub
☆30Jun 1, 2026Updated last month
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,263Oct 29, 2025Updated 8 months ago
si0wang / VisVM
View on GitHub
☆46Dec 30, 2024Updated last year
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆422Jan 29, 2026Updated 5 months ago
IntMeGroup / LMM4LMM
View on GitHub
[ICCV 2025 Highlight] LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
☆20Nov 16, 2025Updated 8 months ago