LiWentomng / gradio-osprey-demo
Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.
☆14Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for gradio-osprey-demo
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆60Updated 2 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆61Updated last month
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆17Updated last month
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆36Updated 2 weeks ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆58Updated 3 weeks ago
- ☆18Updated last year
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆16Updated 2 years ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆90Updated 4 months ago
- A Training-free Iterative Framework for Long Story Visualization☆62Updated this week
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆38Updated 7 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆39Updated 3 months ago
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆110Updated 3 months ago
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆82Updated last year
- [CVPR 2024] Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models☆64Updated 7 months ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated last week
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆40Updated last month
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated last week
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆89Updated 2 weeks ago
- 【ECCV2024】The official repo of Griffon series☆106Updated 2 weeks ago
- ☆57Updated last year
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- Precision Search through Multi-Style Inputs☆54Updated 3 months ago
- Simple script to parallelize download and extract files for SA-1B Dataset.☆30Updated last month
- Code release for "SegLLM: Multi-round Reasoning Segmentation"☆37Updated 2 weeks ago
- ☆19Updated last year
- Distilling the powerful segment anything models into lightweight ones for efficient segmentation.☆29Updated last year
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated 7 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆35Updated 3 weeks ago
- Official implementation of TagAlign☆32Updated 7 months ago