Code for paper: Reinforced Vision Perception with Tools
☆73Oct 3, 2025Updated 5 months ago
Alternatives and similar repositories for REVPT
Users that are interested in REVPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains a PyTorch implementation of the ICSE'26 paper "Scrub It Out! Erasing Sensitive Memorization in Code Language Mod…☆30Sep 18, 2025Updated 6 months ago
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams☆55Mar 15, 2026Updated 2 weeks ago
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆172Jan 26, 2026Updated 2 months ago
- Improving Math reasoning through Direct Preference Optimization with Verifiable Pairs☆19Mar 20, 2025Updated last year
- #ICCV, #MoE, #Tracking☆33Jul 11, 2025Updated 8 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Professional desktop app for converting text to audiobooks with local TTS☆31Oct 6, 2025Updated 5 months ago
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection☆26May 31, 2025Updated 9 months ago
- Vertebral-level CT/X-ray registration through joint 3D Radiative Gaussians (RadGS) reconstruction and 3D/3D registration.☆32Oct 18, 2025Updated 5 months ago
- This repository contains the code for the paper - "Aligning Text, Images, and 3D Structure Token-by-Token" (CVPR 2026)☆44Jun 11, 2025Updated 9 months ago
- Building an Intelligent AWS Cloud Engineer Agent with Strands Agents SDK☆24Dec 16, 2025Updated 3 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆66Updated this week
- ☆22Apr 15, 2025Updated 11 months ago
- Generative Regional Editing (GRE) Benchmark☆19Sep 10, 2024Updated last year
- 🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)☆234Jan 4, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Test-Time Memory Framework: Control Hallucinations in Foundation Models☆11Nov 4, 2025Updated 4 months ago
- Open-WikiTable :Dataset for Open Domain Question Answering with Complex Reasoning over Table☆27Jun 2, 2023Updated 2 years ago
- ☆12Dec 6, 2024Updated last year
- ☆13May 17, 2025Updated 10 months ago
- A CustomNet node for ComfyUI☆10Aug 11, 2024Updated last year
- A collection of Claude commands and utilities☆25Updated this week
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆43Mar 2, 2026Updated 3 weeks ago
- Fastest way to scaffold FastHTML applications.☆36Sep 13, 2025Updated 6 months ago
- ☆12Dec 4, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- paper-read-notes☆13Sep 26, 2024Updated last year
- AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network☆16Feb 11, 2025Updated last year
- [CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models☆18Jul 22, 2024Updated last year
- Official code for "VideoReward Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning"☆45Oct 20, 2025Updated 5 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆50Jan 14, 2025Updated last year
- Zsh completion plugin for the LLM CLI tool by Simon Willison☆20May 28, 2025Updated 10 months ago
- [CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning☆13Jun 7, 2025Updated 9 months ago
- Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"☆332Jan 6, 2026Updated 2 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆414Aug 26, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆12Jan 10, 2025Updated last year
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆55Dec 25, 2025Updated 3 months ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆12Jun 11, 2024Updated last year
- Open-vocabulary Semantic Segmentation☆33Feb 16, 2024Updated 2 years ago
- Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"☆15Jan 25, 2024Updated 2 years ago
- [NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning☆287Jul 15, 2025Updated 8 months ago
- [SIGGRAPH Asia 2025] The official implementation of the paper "DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinat…☆34Mar 10, 2026Updated 2 weeks ago