WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆297Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for ViP-LLaVA
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆212Updated 3 months ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆245Updated 10 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- ☆289Updated 9 months ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆175Updated 5 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆140Updated last month
- Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding☆217Updated 3 months ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆148Updated last month
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆186Updated 10 months ago
- [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models☆231Updated last month
- Official implementation of the Law of Vision Representation in MLLMs☆131Updated this week
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆459Updated 3 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆142Updated last week
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆315Updated 4 months ago
- LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images☆318Updated last month
- When do we not need larger vision models?