dino-chiio / blip-vqa-finetuneLinks
This is implementation of finetuning BLIP model for Visual Question Answering
☆68Updated last year
Alternatives and similar repositories for blip-vqa-finetune
Users that are interested in blip-vqa-finetune are comparing it to the libraries listed below
Sorting:
- InstructionGPT-4☆39Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆321Updated 10 months ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆90Updated last year
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆156Updated 8 months ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆74Updated last year
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆78Updated 4 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 9 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)