LiWentomng / gradio-osprey-demoLinks
Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.
☆16Updated last year
Alternatives and similar repositories for gradio-osprey-demo
Users that are interested in gradio-osprey-demo are comparing it to the libraries listed below
Sorting:
- Precision Search through Multi-Style Inputs☆73Updated 4 months ago
- ☆20Updated 2 years ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆99Updated last year
- ☆195Updated 6 months ago
- ☆33Updated last year
- ☆71Updated 2 years ago
- Image Editing Anything☆116Updated 2 years ago
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆110Updated 6 months ago
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆136Updated last year
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆20Updated 3 years ago
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆52Updated 5 months ago
- [CVPR 2024 Highlight] Official GraCo: Granularity-Controllable Interactive Segmentation.☆61Updated 8 months ago
- Codebase for the Recognize Anything Model (RAM)☆87Updated last year
- Official PyTorch implementation for TCSVT 23 "Detect Any Shadow: Segment Anything for Video Shadow Detection"☆65Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated last year
- ☆94Updated last year
- Simple script to parallelize download and extract files for SA-1B Dataset.☆37Updated 5 months ago
- ☆58Updated 2 years ago
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆85Updated 2 years ago
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆127Updated 5 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆95Updated 10 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated last year
- Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor☆110Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆128Updated last year
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories☆83Updated 4 months ago
- ☆35Updated last year
- This repository is for the first survey on SAM & SAM2 for Videos.☆52Updated 7 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated last year
- 🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)☆199Updated last month
- [IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation☆128Updated last year