dvlab-research / Seg-ZeroLinks

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

☆580

Alternatives and similar repositories for Seg-Zero

Users that are interested in Seg-Zero are comparing it to the libraries listed below

Sorting:

dvlab-research / VisionReasoner
Vision Manus: Your versatile Visual AI assistant
☆304Updated 2 months ago
nnnth / UFO
[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…
☆261Updated last month
dvlab-research / VisionZip
Official repository for VisionZip (CVPR 2025)
☆391Updated 5 months ago
LeapLabTHU / GSVA
[CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models
☆155Updated last year
deepcs233 / Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆412Updated last year
mc-lan / Awesome-MLLM-Segmentation
A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…
☆178Updated 2 weeks ago
linkangheng / PR1
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆278Updated 5 months ago
saccharomycetes / mllms_know
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆310Updated 8 months ago
congvvc / HyperSeg
[CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".
☆177Updated last year
cilinyan / VISA
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆199Updated last year
mc-lan / Text4Seg
[ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation
☆154Updated last month
MaverickRen / PixelLM
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆245Updated 10 months ago
zamling / PSALM
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
☆266Updated 11 months ago
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆201Updated 5 months ago
360CVGroup / FG-CLIP
New generation of CLIP with fine grained discrimination capability, ICML2025
☆516Updated last month
Osilly / Vision-R1
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages …
☆744Updated 3 months ago
Wang-Xiaodong1899 / CVPR25-MLLM-Paper-List
🔥CVPR 2025 Multimodal Large Language Models Paper List
☆153Updated 9 months ago
linhuixiao / Awesome-Visual-Grounding
[TPAMI 2025] Towards Visual Grounding: A Survey
☆269Updated last month
jefferyZhan / Griffon
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.
☆247Updated 4 months ago
shufangxun / LLaVA-MoD
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
☆214Updated 8 months ago
Wang-Xiaodong1899 / Open-R1-Video
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆380Updated 10 months ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆256Updated last month
SkyworkAI / Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
☆578Updated last year
tulerfeng / Video-R1
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆782Updated last week
OpenGVLab / VideoChat-R1
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆249Updated 2 months ago
PolyU-ChenLab / UniPixel
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆209Updated 2 months ago
appletea233 / AL-Ref-SAM2
[AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video…
☆91Updated last year
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆335Updated last year
dongyh20 / Insight-V
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆230Updated last month
baaivision / DIVA
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
☆298Updated 11 months ago