[NeurIPS2025 Spotlight π₯ ] Official implementation of πΈ "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"
β271Nov 5, 2025Updated 4 months ago
Alternatives and similar repositories for UFO
Users that are interested in UFO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β12Nov 26, 2024Updated last year
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"β618Jan 17, 2026Updated 2 months ago
- β31Jul 21, 2025Updated 8 months ago
- [ECCV2024 Oralπ₯] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"β360Jan 14, 2025Updated last year
- β21Jun 15, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Boosting 3D Object Detection via Object-Focused Image Fusionβ59Sep 11, 2022Updated 3 years ago
- β44Jul 9, 2025Updated 8 months ago
- Official Repo For Pixel-LLM Codebaseβ1,565Feb 27, 2026Updated last month
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generationβ166Nov 8, 2025Updated 4 months ago
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"β270Dec 30, 2024Updated last year
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"β23Nov 24, 2025Updated 4 months ago
- β23Aug 20, 2024Updated last year
- [CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".β180Dec 13, 2024Updated last year
- Progressive Language-guided Visual Learning for Multi-Task Visual Groundingβ13May 9, 2025Updated 10 months ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ICCV 2025] MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentationβ22Sep 5, 2025Updated 6 months ago
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'ββ2,321Oct 29, 2025Updated 5 months ago
- [NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.β31Nov 13, 2025Updated 4 months ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understandingβ212Oct 15, 2025Updated 5 months ago
- What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughnessβ26May 16, 2025Updated 10 months ago
- Code release for "SegLLM: Multi-round Reasoning Segmentation"β128Feb 20, 2025Updated last year
- Vision Relation Transformer for Unbiased Scene Graph Generation (ICCV 2023)β22Mar 23, 2026Updated last week
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioningβ1,474Jun 26, 2025Updated 9 months ago
- VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learningβ330Feb 9, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Official implementation for our paper: Rethinking Video Tokenization: A Conditioned Diffusion-based Approachβ14Apr 2, 2025Updated 11 months ago
- [AAAI2025 selected as oral] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraintsβ44Jul 2, 2025Updated 8 months ago
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Modelsβ164Sep 12, 2024Updated last year
- Code for Scaling Language-Free Visual Representation Learning (WebSSL)β244Apr 24, 2025Updated 11 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoningβ43Mar 2, 2026Updated 3 weeks ago
- Solve Visual Understanding with Reinforced VLMsβ5,898Mar 12, 2026Updated 2 weeks ago
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β257Feb 11, 2025Updated last year
- [ECCV 2024] The official code of paper "Open-Vocabulary SAM".β1,030Aug 4, 2025Updated 7 months ago
- [ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.β60Nov 10, 2025Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learningβ287Jul 15, 2025Updated 8 months ago
- [CVPR2025] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Modelsβ19Apr 30, 2025Updated 11 months ago
- ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generationβ28May 27, 2025Updated 10 months ago
- SFT+RL boosts multimodal reasoningβ47Jun 27, 2025Updated 9 months ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β951Aug 5, 2025Updated 7 months ago
- [CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perceptionβ153Jan 10, 2026Updated 2 months ago
- MLLMSeg: Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decoderβ51Aug 16, 2025Updated 7 months ago