ls-kelvin / REVPTView external linksLinks
Code for paper: Reinforced Vision Perception with Tools
☆70Oct 3, 2025Updated 4 months ago
Alternatives and similar repositories for REVPT
Users that are interested in REVPT are comparing it to the libraries listed below
Sorting:
- Code accompanying the 2022 DLS paper "Misleading Deep-Fake Detection with GAN Fingerprints"☆10May 26, 2022Updated 3 years ago
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection☆22May 31, 2025Updated 8 months ago
- Vertebral-level CT/X-ray registration through joint 3D Radiative Gaussians (RadGS) reconstruction and 3D/3D registration.☆26Oct 18, 2025Updated 3 months ago
- paper-read-notes☆13Sep 26, 2024Updated last year
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆12Jun 11, 2024Updated last year
- Open-vocabulary Semantic Segmentation☆33Feb 16, 2024Updated 2 years ago
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆45Dec 25, 2025Updated last month
- #ICCV, #MoE, #Tracking☆33Jul 11, 2025Updated 7 months ago
- Building an Intelligent AWS Cloud Engineer Agent with Strands Agents SDK☆23Dec 16, 2025Updated last month
- This repository contains a PyTorch implementation of the ICSE'26 paper "Scrub It Out! Erasing Sensitive Memorization in Code Language Mod…☆29Sep 18, 2025Updated 4 months ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated last month
- ☆18Mar 1, 2024Updated last year
- ☆75Jun 28, 2025Updated 7 months ago
- ☆21Jan 17, 2025Updated last year
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆19Jul 20, 2024Updated last year
- An image retrieval system based on MXNET : From training to website☆18Jul 9, 2019Updated 6 years ago
- This repository contains the code for the paper - "Aligning Text, Images, and 3D Structure Token-by-Token"☆42Jun 11, 2025Updated 8 months ago
- Official Implementation of "Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning"☆25Dec 16, 2025Updated last month
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆27May 26, 2025Updated 8 months ago
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference☆97Mar 26, 2025Updated 10 months ago
- [CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models☆18Jul 22, 2024Updated last year
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆27Mar 29, 2024Updated last year
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆64Oct 22, 2024Updated last year
- [NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning☆285Jul 15, 2025Updated 7 months ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Nov 7, 2024Updated last year
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆66Jun 10, 2025Updated 8 months ago
- 🔥🔥🔥Latest Papers, Codes on Uncertainty-based RL☆60Aug 24, 2025Updated 5 months ago
- [NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.☆30Nov 13, 2025Updated 3 months ago
- This repo holds the competitions (information, solutions, summaries, memories) that our team has participated in☆26Feb 4, 2024Updated 2 years ago
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"☆62Dec 8, 2025Updated 2 months ago
- ☆28May 22, 2025Updated 8 months ago
- ☆135Jan 26, 2026Updated 3 weeks ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆392Aug 26, 2025Updated 5 months ago
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆31Apr 20, 2025Updated 9 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆41Aug 4, 2025Updated 6 months ago
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆145Jan 26, 2026Updated 2 weeks ago
- [ICLR 2026] P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark☆47Jun 6, 2025Updated 8 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆45Jul 2, 2025Updated 7 months ago
- TrackGPT: Track What You Need in Videos via Text Prompts☆25May 16, 2023Updated 2 years ago