JialianW / GRiT
GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)
☆320Updated last year
Alternatives and similar repositories for GRiT:
Users that are interested in GRiT are comparing it to the libraries listed below
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆483Updated 8 months ago
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆198Updated 3 months ago
- Official Repository of ChatCaptioner☆464Updated 2 years ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆256Updated last year
- GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest☆527Updated 10 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts