nnnth/UFO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nnnth/UFO)

nnnth / UFO

[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"

☆281

Alternatives and similar repositories for UFO

Users that are interested in UFO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JIA-Lab-research / Seg-Zero
View on GitHub
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
☆636Jan 17, 2026Updated 6 months ago
baoxiaoyi / CoReS
View on GitHub
code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"
☆23Nov 24, 2025Updated 8 months ago
yayafengzi / LMM-HiMTok
View on GitHub
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
☆98Jul 17, 2025Updated last year
JIA-Lab-research / VisionReasoner
View on GitHub
[ICLR 2026] VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
☆348Feb 9, 2026Updated 5 months ago
bytedance / Sa2VA
View on GitHub
Official Repo For Pixel-LLM Codebase: Sa2VA (PAMI-26), SAMTok (CVPR-26), VRT (Arxiv-25), SaSaSa2VA (1-st solution for LSVOS)
☆1,649Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
congvvc / HyperSeg
View on GitHub
[CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".
☆183Dec 13, 2024Updated last year
yayafengzi / ALToLLM
View on GitHub
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation
☆30May 27, 2025Updated last year
Haiyang-W / GiT
View on GitHub
[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
☆364Jan 14, 2025Updated last year
mc-lan / Text4Seg
View on GitHub
[ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation
☆177Nov 8, 2025Updated 8 months ago
jcwang0602 / MLLMSeg
View on GitHub
MLLMSeg: Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decoder
☆57Jun 12, 2026Updated last month
Eniac-Xie / FuseTeacher
View on GitHub
☆12Nov 26, 2024Updated last year
PolyU-ChenLab / UniPixel
View on GitHub
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆247Jan 4, 2026Updated 6 months ago
haoy945 / DeMF
View on GitHub
Boosting 3D Object Detection via Object-Focused Image Fusion
☆59Sep 11, 2022Updated 3 years ago
mc-lan / Awesome-MLLM-Segmentation
View on GitHub
A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…
☆231Jun 28, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
berkeley-hipie / segllm
View on GitHub
Code release for "SegLLM: Multi-round Reasoning Segmentation"
☆129Feb 20, 2025Updated last year
zamling / PSALM
View on GitHub
[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"
☆269Dec 30, 2024Updated last year
GiantAILab / DeepSound-V1
View on GitHub
Official code for DeepSound-V1
☆12May 14, 2025Updated last year
MICV-yonsei / CASS
View on GitHub
[CVPR 2025] Official Pytorch Code for Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
☆50Mar 27, 2025Updated last year
jdg900 / MMR
View on GitHub
[ICLR 2025] Official Pytorch Implementation of MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segm…
☆28Apr 3, 2025Updated last year
congvvc / InstructSeg
View on GitHub
[ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"
☆56Feb 10, 2025Updated last year
linkangheng / PR1
View on GitHub
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆289Jul 15, 2025Updated last year
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
nnnth / UniLIP
View on GitHub
[ICLR 2026 🔥 ] Official implementation of "UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing"
☆151Jan 26, 2026Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,262Oct 29, 2025Updated 9 months ago
Ghy0501 / HiDe-LLaVA
View on GitHub
[ACL'25 Main] Official Implementation of HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Languag…
☆55Jun 1, 2026Updated last month
IDEA-Research / Rex-Omni
View on GitHub
[CVPR2026] Detect Anything via Next Point Prediction
☆1,526Feb 22, 2026Updated 5 months ago
rui-qian / READ
View on GitHub
Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)
☆54Feb 4, 2026Updated 5 months ago
jailflip / jailflip-2025
View on GitHub
☆22Jan 9, 2026Updated 6 months ago
Eniac-Xie / TEAM
View on GitHub
☆22Jun 15, 2023Updated 3 years ago
MaverickRen / PixelLM
View on GitHub
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
☆273Feb 11, 2025Updated last year
HarborYuan / ovsam
View on GitHub
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
☆1,031Aug 4, 2025Updated 11 months ago
IDEA-Research / ChatRex
View on GitHub
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
☆216Oct 15, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AI-Application-and-Integration-Lab / SAM4MLLM
View on GitHub
[ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
☆51Mar 20, 2025Updated last year
JIA-Lab-research / LISA
View on GitHub
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
☆2,671Feb 16, 2025Updated last year
hustvl / GroundingSuite
View on GitHub
[ICCV 2025] GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
☆77Jun 26, 2025Updated last year
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,018Jul 7, 2026Updated 3 weeks ago
GLUS-video / GLUS
View on GitHub
[CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…
☆70Jun 23, 2025Updated last year
mbzuai-oryx / groundingLMM
View on GitHub
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆964Aug 5, 2025Updated 11 months ago
rongfu-dsb / MPG-SAM2
View on GitHub
[ICCV 2025] MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
☆23Sep 5, 2025Updated 10 months ago