dvlab-research / VisionReasonerLinks

Vision Manus: Your versatile Visual AI assistant

☆304

Alternatives and similar repositories for VisionReasoner

Users that are interested in VisionReasoner are comparing it to the libraries listed below

Sorting:

nnnth / UFO
[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…
☆261Updated last month
dvlab-research / Seg-Zero
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
☆580Updated 4 months ago
congvvc / HyperSeg
[CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".
☆177Updated last year
IDEA-Research / ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆209Updated 2 months ago
PolyU-ChenLab / UniPixel
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆209Updated 2 months ago
linkangheng / PR1
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆277Updated 5 months ago
MaverickRen / PixelLM
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆245Updated 10 months ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆256Updated last month
zamling / PSALM
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
☆266Updated 11 months ago
appletea233 / AL-Ref-SAM2
[AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video…
☆91Updated last year
eric-ai-lab / GRIT
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆165Updated 3 weeks ago
jefferyZhan / Griffon
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.
☆247Updated 4 months ago
PKU-ICST-MIPL / DyFo_CVPR2025
☆100Updated 4 months ago
xiaomoguhz / DeCLIP
[CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
☆147Updated 6 months ago
mc-lan / Text4Seg
[ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation
☆156Updated last month
LeapLabTHU / GSVA
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
☆155Updated last year
360CVGroup / FG-CLIP
New generation of CLIP with fine grained discrimination capability, ICML2025
☆516Updated 2 months ago
cilinyan / VISA
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆199Updated last year
dongyh20 / Insight-V
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆230Updated last month
callsys / DynRefer
[CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
☆57Updated 9 months ago
ncTimTang / AKS
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆150Updated last week
FoundationVision / GenerateU
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
☆187Updated 8 months ago
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆201Updated 5 months ago
OpenGVLab / VideoChat-R1
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆252Updated 2 months ago
HVision-NKU / MaskCLIPpp
Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"
☆43Updated 9 months ago
Christinepan881 / DINO-R1
☆53Updated 5 months ago
yu-rp / VisualPerceptionToken
☆133Updated 9 months ago
Fantasyele / LLaVA-KD
[ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
☆116Updated 2 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆168Updated last year
x-cls / superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
☆221Updated 9 months ago