dvlab-research / VisionReasonerLinks
The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"
β222Updated last month
Alternatives and similar repositories for VisionReasoner
Users that are interested in VisionReasoner are comparing it to the libraries listed below
Sorting:
- Official implementation of πΈ "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"β205Updated last month
- Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".β156Updated 7 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"β452Updated last month
- New generation of CLIP with fine grained discrimination capability, ICML2025β228Updated this week
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"β243Updated 6 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understandingβ194Updated 5 months ago
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolutionβ51Updated 4 months ago
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β229Updated 5 months ago
- Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learningβ219Updated 2 weeks ago
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generationβ105Updated 3 months ago
- Official repo of Griffon series including v1(ECCV 2024), v2, and Gβ223Updated last month
- β69Updated 2 months ago
- β85Updated last year
- Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"β36Updated 3 months ago
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Modelsβ137Updated 10 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Modelsβ87Updated last week
- Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inferenceβ157Updated 9 months ago
- [AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Videoβ¦β84Updated 6 months ago
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"β102Updated last week
- [NeurIPS 2024 Spotlight βοΈ] Parameter-Inverted Image Pyramid Networks (PIIP)β92Updated last month
- [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detectionβ177Updated 3 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Modelβ181Updated 11 months ago
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anythingβ63Updated last year
- (CVPR 2025 highlightβ¨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Laβ¦β270Updated last month
- [ECCV 2024] Official implementation of "LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction"β78Updated 3 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Contextβ163Updated 9 months ago
- Official code for paper "GRIT: Teaching MLLMs to Think with Images"β105Updated 3 weeks ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Trainingβ211Updated 3 months ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"β206Updated last year
- Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Futureβ187Updated 3 months ago