LiWentomng / gradio-osprey-demoLinks
Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.
☆16Updated last year
Alternatives and similar repositories for gradio-osprey-demo
Users that are interested in gradio-osprey-demo are comparing it to the libraries listed below
Sorting:
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆19Updated 3 years ago
- ☆20Updated 2 years ago
- Precision Search through Multi-Style Inputs☆73Updated 2 months ago
- Codebase for the Recognize Anything Model (RAM)☆85Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆126Updated 11 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated last year
- Image Editing Anything☆116Updated 2 years ago
- ☆31Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.☆238Updated 8 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated last year
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆98Updated last year
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆123Updated 3 months ago
- [CVPR 2024 Highlight] Official GraCo: Granularity-Controllable Interactive Segmentation.☆59Updated 7 months ago
- ☆193Updated 5 months ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆158Updated last year
- ☆94Updated last year
- YOLO-World + EfficientViT SAM☆106Updated last year
- 🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)☆109Updated this week
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆135Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆65Updated last year
- Official PyTorch implementation for TCSVT 23 "Detect Any Shadow: Segment Anything for Video Shadow Detection"☆64Updated 10 months ago
- ☆91Updated 7 months ago
- 🏞️ Official implementation of "Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition"☆108Updated last year
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆49Updated 3 months ago
- VimTS: A Unified Video and Image Text Spotter☆78Updated 11 months ago
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆142Updated 5 months ago
- ☆68Updated last year
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"☆255Updated 9 months ago
- ☆41Updated 9 months ago
- A Simple Framework of Small-scale LMMs for Video Understanding☆94Updated 4 months ago