ls-kelvin/REVPT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ls-kelvin/REVPT)

ls-kelvin / REVPT

Code for paper: Reinforced Vision Perception with Tools

☆74

Alternatives and similar repositories for REVPT

Users that are interested in REVPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OoDBag / VisTA
View on GitHub
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
☆27May 31, 2025Updated last year
agents-x-project / PyVision
View on GitHub
[MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."
☆162Jul 22, 2025Updated last year
zxiangx / LC-R1
View on GitHub
Code for paper: Optimizing Length Compression in Large Reasoning Models
☆29Oct 20, 2025Updated 9 months ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆423Jan 29, 2026Updated 6 months ago
zzzhhzzz / Ground-R1
View on GitHub
☆46Jul 14, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Visual-Agent / DeepEyes
View on GitHub
☆1,251Nov 20, 2025Updated 8 months ago
yfzhang114 / Thyme
View on GitHub
✨✨ [ICLR 2026] Think Beyond Images
☆584Sep 23, 2025Updated 10 months ago
CongHan0808 / DeOP
View on GitHub
Open-vocabulary Semantic Segmentation
☆33Feb 16, 2024Updated 2 years ago
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆102Oct 15, 2025Updated 9 months ago
Lucky-Wang-Chenlong / CodeSync
View on GitHub
[ICML25] CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale
☆24Jul 31, 2025Updated 11 months ago
thunlp / KARL
View on GitHub
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
☆68Apr 5, 2026Updated 3 months ago
EvolvingLMMs-Lab / multimodal-search-r1
View on GitHub
[ACL-2026] MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal…
☆470Apr 7, 2026Updated 3 months ago
VidCapBench / VidCapBench
View on GitHub
☆13May 17, 2025Updated last year
TIGER-AI-Lab / Pixel-Reasoner
View on GitHub
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆301Jun 4, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
OpenSenseNova / SenseNova-MARS
View on GitHub
☆122Apr 9, 2026Updated 3 months ago
linkangheng / PR1
View on GitHub
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆289Jul 15, 2025Updated last year
facebookresearch / multimodal_rewardbench
View on GitHub
Multimodal RewardBench
☆68Feb 21, 2025Updated last year
zhaochen0110 / OpenThinkIMG
View on GitHub
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
☆399Jun 1, 2025Updated last year
AntResearchNLP / ViLaSR
View on GitHub
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆98Jul 27, 2025Updated last year
IVUL-KAUST / VideoAuto-R1
View on GitHub
[CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆88Feb 27, 2026Updated 5 months ago
Aurora-slz / MM-Verify
View on GitHub
☆19Oct 28, 2025Updated 9 months ago
ModalMinds / gym-v
View on GitHub
A unified framework for vision-language environments with Gymnasium-compatible interface
☆35Mar 17, 2026Updated 4 months ago
JIA-Lab-research / VisionReasoner
View on GitHub
[ICLR 2026] VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
☆348Feb 9, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆884Dec 14, 2025Updated 7 months ago
OpenThinkIMG / OpenThinkIMG
View on GitHub
OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.
☆123Jul 11, 2025Updated last year
Dongping-Chen / Clawatar
View on GitHub
From Agentic Intelligence to Interactive Intelligence. Give your AI agent a body and home.
☆19Feb 22, 2026Updated 5 months ago
VTool-R1 / VTool-R1
View on GitHub
[ICLR 2026] "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"
☆199Mar 20, 2026Updated 4 months ago
yunfeixie233 / ViGaL
View on GitHub
☆70Feb 4, 2026Updated 5 months ago
SII-Ferenas / PGSeg
View on GitHub
This is the official code of "Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation, NeurIPS 23"
☆27Dec 7, 2023Updated 2 years ago
spacetools / SpaceTools
View on GitHub
code release
☆38Jun 22, 2026Updated last month
OpenGVLab / VideoChat-R1
View on GitHub
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆268Oct 18, 2025Updated 9 months ago
FelixHertlein / inv3d
View on GitHub
Project page for the ICDAR 2023 Paper "Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping".
☆13Dec 21, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
xuliu-cyber / RSUniVLM
View on GitHub
☆47Apr 16, 2026Updated 3 months ago
ModalMinds / MM-PRM
View on GitHub
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
☆30May 26, 2025Updated last year
ThinkMorph / ThinkMorph
View on GitHub
[ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
☆192May 1, 2026Updated 2 months ago
TideDra / lmm-r1
View on GitHub
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆848May 14, 2025Updated last year
OrigamiSL / OTETrack
View on GitHub
Source code of the paper: Overlapped Trajectory-Enhanced Visual Tracking
☆11Sep 3, 2024Updated last year
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,262Oct 29, 2025Updated 9 months ago
VoyageWang / VG-Refiner
View on GitHub
The repository of VG-Refiner paper
☆20Dec 9, 2025Updated 7 months ago