agents-x-project/PyVision-RL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/agents-x-project/PyVision-RL)

agents-x-project / PyVision-RL

[ICML 2026] Official implementation of "PyVision-RL: Forging Open Agentic Vision Models via RL."

☆69

Alternatives and similar repositories for PyVision-RL

Users that are interested in PyVision-RL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

agents-x-project / PyVision
View on GitHub
[MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."
☆162Jul 22, 2025Updated 11 months ago
agents-x-project / TIR-Bench
View on GitHub
[ECCV 2026] Official implementation of "TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning"
☆25Feb 8, 2026Updated 5 months ago
wanglu-cs / Think_While_Watching
View on GitHub
☆19Jun 26, 2026Updated 3 weeks ago
w-yibo / VTC-R1
View on GitHub
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning.
☆26Feb 20, 2026Updated 5 months ago
CYWang735 / AdaTooler-V
View on GitHub
☆71Feb 27, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
qiujihao19 / LongVideo-R1
View on GitHub
[CVPR 2026] LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
☆50Jul 7, 2026Updated 2 weeks ago
OoDBag / VisTA
View on GitHub
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
☆27May 31, 2025Updated last year
EvolvingLMMs-Lab / SimpleStream
View on GitHub
A simple video streaming baseline that outperforms SOTAs.
☆148May 1, 2026Updated 2 months ago
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
xuyang-liu16 / hermes-code-bridge
View on GitHub
Use Hermes Agent as the control plane for local coding agents like Codex, Kimi Code, Claude Code, OpenCode, and Gemini CLI.
☆23May 28, 2026Updated last month
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 4 months ago
zsgvivo / VideoZoomer
View on GitHub
☆34Feb 12, 2026Updated 5 months ago
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
viiika / Prism
View on GitHub
[ICML 2026] Official Implementation of Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diff…
☆21Mar 4, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jylins / videoseek
View on GitHub
[CVPR 2026] VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
☆64Mar 23, 2026Updated 3 months ago
Gabesarch / grounded-rl
View on GitHub
☆132Jul 22, 2025Updated 11 months ago
haowei-freesky / HERMES
View on GitHub
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆92May 8, 2026Updated 2 months ago
maifoundations / Streamo
View on GitHub
Streaming Video Instruction Tuning
☆79Feb 25, 2026Updated 4 months ago
wangruohui / EfficientVideoAgent
View on GitHub
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
☆26May 6, 2026Updated 2 months ago
64327069 / LVAgent
View on GitHub
Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆39Nov 24, 2025Updated 7 months ago
QiWang98 / VideoRFT
View on GitHub
[NeurIPS 2025] VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
☆65Jan 6, 2026Updated 6 months ago
lern-to-write / STC
View on GitHub
[CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
☆70Jun 8, 2026Updated last month
DocTron-hub / VinciCoder
View on GitHub
☆42Jan 9, 2026Updated 6 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
bethgelab / supersanity
View on GitHub
A critical analysis of the Cambrian-S model and VSI-Super benchmarks
☆16Nov 20, 2025Updated 8 months ago
egolife-ai / Ego-R1
View on GitHub
[TPAMI 2026] Ego-R1: Agentic Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆165Jun 10, 2026Updated last month
Gloria2tt / T3M
View on GitHub
☆37Sep 5, 2024Updated last year
HJYao00 / R1-ShareVL
View on GitHub
[NeurIPS 2025] Reasoning MLLM, Share-GRPO, advantage vanishing, sparse reward
☆38Sep 19, 2025Updated 10 months ago
SJTU-DENG-Lab / R1-Zero-VSI
View on GitHub
☆42Jun 9, 2025Updated last year
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆101Oct 15, 2025Updated 9 months ago
mll-lab-nu / ViewAgent
View on GitHub
☆20Jul 3, 2026Updated 2 weeks ago
ls-kelvin / REVPT
View on GitHub
Code for paper: Reinforced Vision Perception with Tools
☆74Oct 3, 2025Updated 9 months ago
fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Hansxsourse / VRMDiff
View on GitHub
☆11Mar 11, 2025Updated last year
zhangzaibin / spagent
View on GitHub
SPAgent, a foundation agent for understanding, reasoning over, and operating within the physical and spatial world.
☆198Updated this week
EvolvingLMMs-Lab / ParaVT
View on GitHub
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
☆54Jun 2, 2026Updated last month
hrtang22 / MUSE
View on GitHub
Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"
☆26Feb 2, 2025Updated last year
GAIR-NLP / Med
View on GitHub
[ICML 2026] What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-…
☆21May 15, 2026Updated 2 months ago
IVUL-KAUST / VideoAuto-R1
View on GitHub
[CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆88Feb 27, 2026Updated 4 months ago
MCG-NJU / StreamForest
View on GitHub
[NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory
☆131Nov 4, 2025Updated 8 months ago