LiWentomng / gradio-osprey-demo
Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.
☆15Updated last year
Alternatives and similar repositories for gradio-osprey-demo:
Users that are interested in gradio-osprey-demo are comparing it to the libraries listed below
- ☆18Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 6 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆93Updated 8 months ago
- ☆49Updated 3 weeks ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆68Updated 2 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆64Updated 6 months ago
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆18Updated 2 years ago
- ☆19Updated last year
- ☆41Updated 3 months ago
- ☆58Updated last year
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆44Updated 4 months ago
- [CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.☆57Updated this week
- ☆33Updated last month
- ☆27Updated last week
- Conceptrol: Concept Control of Zero-shot Personalized Image Generation☆15Updated last week
- Code release for "SegLLM: Multi-round Reasoning Segmentation"☆68Updated last month
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated last year
- ☆111Updated 7 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆27Updated 4 months ago
- 🏞️ Official implementation of "Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition"☆105Updated 10 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 7 months ago
- ☆25Updated 8 months ago
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆110Updated 7 months ago
- Official implementation of TagAlign☆34Updated 3 months ago
- ☆49Updated 3 months ago
- Simple script to parallelize download and extract files for SA-1B Dataset.☆36Updated 5 months ago
- VimTS: A Unified Video and Image Text Spotter☆77Updated 4 months ago
- Video dataset dedicated to portrait-mode video recognition.☆45Updated 3 months ago
- LEO: A powerful Hybrid Multimodal LLM☆14Updated 2 months ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated 11 months ago