aim-uofa / SegAgent
[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
β26Updated 3 weeks ago
Alternatives and similar repositories for SegAgent:
Users that are interested in SegAgent are comparing it to the libraries listed below
- [NeurIPS'24] A Simple Image Segmentation Framework via In-Context Examplesβ51Updated 5 months ago
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β35Updated 9 months ago
- β27Updated 2 months ago
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".β20Updated 5 months ago
- [NeurIPS'24] Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation (Diffews)β34Updated last month
- β29Updated 2 weeks ago
- Learning 1D Causal Visual Representation with De-focus Attention Networksβ32Updated 9 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token promptβ¦β30Updated 5 months ago
- β16Updated this week
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modelingβ26Updated last month
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".β27Updated last year
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)β44Updated this week
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ34Updated last month
- [CVPR 2025] Test-Time Visual In-Context Tuningβ16Updated this week
- [NeurIPS 2024] SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flowβ27Updated 4 months ago
- The offical implemention of JM3D.β29Updated last year
- Code release for "SegLLM: Multi-round Reasoning Segmentation"β70Updated last month
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"β34Updated 11 months ago
- Autoregressive Image Generation with Randomized Parallel Decodingβ35Updated this week
- official code repo of CVPR 2025 paper PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generationβ17Updated 2 weeks ago
- β29Updated 6 months ago
- β58Updated last year
- [CVPR 2025] Open implementation of "RandAR"β69Updated last week
- ROOT: VLM based System for Indoor Scene Understanding and Beyondβ24Updated 2 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentationβ37Updated last year
- β16Updated last year
- β11Updated 3 months ago
- β40Updated 6 months ago
- A collection of vision foundation models unifying understanding and generation.β47Updated 3 months ago
- β54Updated 2 weeks ago