LiWentomng / gradio-osprey-demo
Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.
☆15Updated last year
Alternatives and similar repositories for gradio-osprey-demo:
Users that are interested in gradio-osprey-demo are comparing it to the libraries listed below
- ☆18Updated last year
- ☆22Updated 6 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆28Updated 3 months ago
- LaVin-DiT☆17Updated last month
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆63Updated 2 months ago
- Code release for "SegLLM: Multi-round Reasoning Segmentation"☆56Updated last week
- ☆52Updated last week
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆63Updated 4 months ago
- ☆22Updated last month
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆38Updated 9 months ago
- ☆34Updated 11 months ago
- ☆47Updated last month
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆38Updated 2 weeks ago
- Simple script to parallelize download and extract files for SA-1B Dataset.☆32Updated 3 months ago
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆40Updated 2 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 2 months ago
- ☆62Updated last year
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆17Updated 2 years ago
- ☆86Updated 5 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆31Updated 7 months ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated 2 months ago
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆134Updated 8 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model