linkangheng / PR1Links

Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning

☆228

Alternatives and similar repositories for PR1

Users that are interested in PR1 are comparing it to the libraries listed below

Sorting:

tanhuajie / Reason-RFT
⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.
☆184Updated last week
dvlab-research / VisionReasoner
The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"
☆238Updated 2 weeks ago
Kwai-YuanQi / MM-RLHF
The Next Step Forward in Multimodal LLM Alignment
☆170Updated 3 months ago
Open-Reasoner-Zero / Open-Vision-Reasoner
The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".
☆118Updated 3 weeks ago
dvlab-research / Seg-Zero
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
☆480Updated this week
FanqingM / MM-Eureka-V0
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆313Updated last month
jefferyZhan / Griffon
Official repo of Griffon series including v1(ECCV 2024), v2, and G
☆227Updated 2 months ago
Wang-Xiaodong1899 / Open-R1-Video
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆352Updated 5 months ago
shufangxun / LLaVA-MoD
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
☆186Updated 4 months ago
AMAP-ML / GPG
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
☆152Updated 2 months ago
deepcs233 / Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆354Updated 7 months ago
dongyh20 / Insight-V
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆216Updated last month
nnnth / UFO
Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"
☆211Updated last month
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆129Updated 5 months ago
TheEighthDay / SeekWorld
The first attempt to replicate o3-like visual clue-tracking reasoning capabilities.
☆58Updated 3 weeks ago
OpenRLHF / OpenRLHF-M
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
☆138Updated 3 months ago
ZhangXJ199 / TinyLLaVA-Video-R1
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
☆88Updated 2 months ago
xinyan-cxy / MINT-CoT
☆62Updated last month
Visual-Agent / DeepEyes
☆679Updated 3 weeks ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL
☆180Updated last month
ligeng0197 / Awesome-Thinking-With-Images
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…
☆79Updated 3 weeks ago
Theia-4869 / FasterVLM
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆84Updated last month
ding523 / Curr_REFT
☆67Updated 2 months ago
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆101Updated 3 months ago
Adlith / MoE-Jetpack
[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
☆129Updated 8 months ago
Osilly / Vision-R1
This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-sta…
☆659Updated 3 weeks ago
saccharomycetes / mllms_know
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆240Updated 3 months ago
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆117Updated 4 months ago
Video-R1 / Awesome-Multimodal-Reasoning
Collections of Papers and Projects for Multimodal Reasoning.
☆105Updated 3 months ago
IDEA-Research / ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆197Updated 6 months ago