aim-uofa / SegAgentLinks
[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
โ73Updated last month
Alternatives and similar repositories for SegAgent
Users that are interested in SegAgent are comparing it to the libraries listed below
Sorting:
- [CVPR 2025 ๐ฅ]A Large Multimodal Model for Pixel-Level Visual Grounding in Videosโ86Updated 5 months ago
- Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"โ143Updated 2 months ago
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generationโ145Updated 3 weeks ago
- Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentationโ55Updated 4 months ago
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anythingโ69Updated last year
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosโ135Updated 9 months ago
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Modelsโ149Updated last year
- [CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perceptionโ126Updated 3 months ago
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"โ75Updated last year
- โ32Updated last year
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representationsโ107Updated last month
- โ39Updated 2 months ago
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"โ47Updated 7 months ago
- [NeurIPS'24] Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation (Diffews)โ39Updated 5 months ago
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-ofโฆโ135Updated 3 weeks ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentationโ107Updated 6 months ago
- Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inferenceโ168Updated 11 months ago
- [ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objectsโ51Updated last year
- [NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examplesโ62Updated 11 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Wantโ88Updated 3 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understandingโ45Updated 8 months ago
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"โ109Updated 3 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalizationโ70Updated 2 months ago
- Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"โ109Updated this week
- Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavorโ108Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsโ104Updated 4 months ago
- ๐ฅ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"โ43Updated last year
- Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"โ40Updated 6 months ago
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"โ18Updated 6 months ago
- [ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoningโ39Updated 3 months ago