LiWentomng / gradio-osprey-demo
Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.
☆14Updated 9 months ago
Related projects: ⓘ
- ☆17Updated last year
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆34Updated 2 months ago
- Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.☆16Updated 2 years ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆52Updated 5 months ago
- Simple script to parallelize download and extract files for SA-1B Dataset.☆24Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆58Updated 2 weeks ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆36Updated 5 months ago
- ☆32Updated 8 months ago
- ☆20Updated 9 months ago
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆33Updated 3 weeks ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆25Updated 3 months ago
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆110Updated 3 weeks ago
- ☆23Updated last year
- Precision Search through Multi-Style Inputs☆45Updated last month
- Codebase for the Recognize Anything Model (RAM)☆58Updated 9 months ago
- ☆11Updated 2 months ago
- [NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fi…☆81Updated 10 months ago
- ☆19Updated last year
- ☆56Updated last year
- [ICCV 2023] CLR: Channel-wise Lightweight Reprogramming for Continual Learning☆28Updated 3 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆36Updated last month
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆45Updated last month
- This repository is for the first survey on SAM for videos.☆11Updated last month
- The codes of Siggraph Asia 2024 paper "Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation"☆25Updated 3 weeks ago
- ☆63Updated 9 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆112Updated 2 months ago
- ☆41Updated this week
- ☆17Updated last week
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆88Updated 2 months ago