agents-x-project/PyVision

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/agents-x-project/PyVision)

agents-x-project / PyVision

[MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."

☆162

Alternatives and similar repositories for PyVision

Users that are interested in PyVision are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

agents-x-project / PyVision-RL
View on GitHub
[ICML 2026] Official implementation of "PyVision-RL: Forging Open Agentic Vision Models via RL."
☆69Feb 25, 2026Updated 4 months ago
ls-kelvin / REVPT
View on GitHub
Code for paper: Reinforced Vision Perception with Tools
☆74Oct 3, 2025Updated 9 months ago
zhaochen0110 / OpenThinkIMG
View on GitHub
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
☆399Jun 1, 2025Updated last year
agents-x-project / TIR-Bench
View on GitHub
[ECCV 2026] Official implementation of "TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning"
☆25Feb 8, 2026Updated 5 months ago
Visual-Agent / DeepEyes
View on GitHub
☆1,250Nov 20, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
arctanxarc / GENIUS
View on GitHub
☆42May 9, 2026Updated 2 months ago
inclusionAI / M2-Reasoning
View on GitHub
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning
☆47Jul 17, 2025Updated last year
AntResearchNLP / ViLaSR
View on GitHub
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆98Jul 27, 2025Updated 11 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,493Mar 9, 2026Updated 4 months ago
sterzhang / PVIT
View on GitHub
Official Repository of Personalized Visual Instruct Tuning
☆34Mar 6, 2025Updated last year
OpenThinkIMG / OpenThinkIMG
View on GitHub
OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.
☆123Jul 11, 2025Updated last year
TIGER-AI-Lab / Pixel-Reasoner
View on GitHub
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆301Jun 4, 2026Updated last month
yfzhang114 / Thyme
View on GitHub
✨✨ [ICLR 2026] Think Beyond Images
☆582Sep 23, 2025Updated 9 months ago
Gabesarch / grounded-rl
View on GitHub
☆132Jul 22, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
HITsz-TMG / ICL-State-Vector
View on GitHub
☆12Jul 4, 2024Updated 2 years ago
facebookresearch / multimodal_rewardbench
View on GitHub
Multimodal RewardBench
☆68Feb 21, 2025Updated last year
xinyan-cxy / MINT-CoT
View on GitHub
[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning
☆107Sep 19, 2025Updated 10 months ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆422Jan 29, 2026Updated 5 months ago
zhangzaibin / spagent
View on GitHub
SPAgent, a foundation agent for understanding, reasoning over, and operating within the physical and spatial world.
☆198Updated this week
MajorDavidZhang / Generalization_unified_VLM
View on GitHub
☆24May 23, 2025Updated last year
MikeWangWZHL / PAPO
View on GitHub
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
☆151Feb 4, 2026Updated 5 months ago
JierunChen / SFT-RL-SynergyDilemma
View on GitHub
☆15Jan 14, 2026Updated 6 months ago
OoDBag / VisTA
View on GitHub
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
☆27May 31, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
CYWang735 / AdaTooler-V
View on GitHub
☆71Feb 27, 2026Updated 4 months ago
uclanlp / OpenVLThinker
View on GitHub
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆154May 25, 2026Updated last month
UMass-Embodied-AGI / Mirage
View on GitHub
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
☆293Aug 2, 2025Updated 11 months ago
yuhui-zh15 / AutoConverter
View on GitHub
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆40May 26, 2025Updated last year
GAIR-NLP / Med
View on GitHub
[ICML 2026] What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-…
☆21May 15, 2026Updated 2 months ago
yunfeixie233 / ViGaL
View on GitHub
☆70Feb 4, 2026Updated 5 months ago
xyq7 / Human-Contribution-Measurement
View on GitHub
☆13Jun 4, 2025Updated last year
w-yibo / VTC-R1
View on GitHub
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning.
☆26Feb 20, 2026Updated 5 months ago
pipilurj / MLLM-protector
View on GitHub
The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
☆46Apr 21, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
EvolvingLMMs-Lab / multimodal-search-r1
View on GitHub
[ACL-2026] MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal…
☆469Apr 7, 2026Updated 3 months ago
Visual-Agent / DeepEyesV2
View on GitHub
☆624Feb 26, 2026Updated 4 months ago
Tiezheng11 / Vision-Language-Vision
View on GitHub
☆65Jul 11, 2025Updated last year
ChenShawn / MultiModal-Jupyter-Sandbox
View on GitHub
Simple code sandbox supporting jupyter notebook style code execution. Used for agent training
☆24Dec 5, 2025Updated 7 months ago
dongyh20 / Insight-V
View on GitHub
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆240Nov 7, 2025Updated 8 months ago
zhipeixu / AvatarShield
View on GitHub
AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection
☆23Jun 3, 2025Updated last year
jialuli-luka / Video-MSG
View on GitHub
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
☆28Apr 14, 2025Updated last year