dvlab-research / VisionReasonerLinks
The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"
β241Updated this week
Alternatives and similar repositories for VisionReasoner
Users that are interested in VisionReasoner are comparing it to the libraries listed below
Sorting:
- Official implementation of πΈ "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"β213Updated last month
- [CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".β160Updated 7 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"β480Updated last week
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understandingβ197Updated 6 months ago
- Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learningβ236Updated 3 weeks ago
- New generation of CLIP with fine grained discrimination capability, ICML2025β259Updated last week
- Official repo of Griffon series including v1(ECCV 2024), v2, and Gβ227Updated 2 months ago
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"β246Updated 7 months ago
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolutionβ51Updated 5 months ago
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β234Updated 5 months ago
- β77Updated 2 months ago
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Modelsβ139Updated 10 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Modelsβ91Updated last month
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generationβ110Updated 2 weeks ago
- [AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Videoβ¦β85Updated 7 months ago
- Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"β37Updated 4 months ago
- β86Updated last year
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"β106Updated 3 weeks ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Modelβ184Updated last year
- Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inferenceβ161Updated 9 months ago
- [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detectionβ177Updated 4 months ago
- [ECCV 2024] Official implementation of "LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction"β82Updated 3 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoningβ89Updated 2 months ago
- Pixel-Level Reasoning Model trained with RLβ180Updated last month
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillationβ186Updated 4 months ago
- [NeurIPS 2024 Spotlight βοΈ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)β96Updated this week
- Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"β89Updated last month
- Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Futureβ191Updated 4 months ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectoriesβ60Updated 4 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Trainingβ211Updated 4 months ago