dino-chiio / blip-vqa-finetuneLinks

This is implementation of finetuning BLIP model for Visual Question Answering

☆77

Alternatives and similar repositories for blip-vqa-finetune

Users that are interested in blip-vqa-finetune are comparing it to the libraries listed below

Sorting:

sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆101Updated 5 months ago
xmed-lab / CLIP_Surgery
[Pattern Recognition 25] CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
☆434Updated 4 months ago
b-hahn / CLIP
FInetuning CLIP for Few Shot Learning
☆43Updated 3 years ago
wangjunchi / LLMSeg
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
☆168Updated last year
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆327Updated last year
waltonfuture / InstructionGPT-4
InstructionGPT-4
☆39Updated last year
yuhangzang / ContextDET
Contextual Object Detection with Multimodal Large Language Models
☆246Updated 9 months ago
NExT-ChatV / NExT-Chat
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
☆246Updated last year
jvpassarelli / sam-clip-segmentation
Image Instance Segmentation - Zero Shot - OpenAI's CLIP + Meta's SAM
☆70Updated 2 years ago
2U1 / Llama3.2-Vision-Finetune
An open-source implementaion for fine-tuning Llama3.2-Vision series by Meta.
☆163Updated 2 months ago
marslanm / Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…
☆77Updated last month
moein-shariatnia / Pix2Seq
Simple Implementation of Pix2Seq model for object detection in PyTorch
☆126Updated last year
zengyan-97 / X2-VLM
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)
☆165Updated 11 months ago
damian0815 / finetune-clip-huggingface
Finetuning CLIP on a small image/text dataset using huggingface libs
☆48Updated 2 years ago
om-ai-lab / GroundVLP
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)
☆69Updated last year
stevebottos / owl-vit-object-detection
object detection based on owl-vit
☆59Updated last year
anguyen8 / gScoreCAM
☆56Updated last year
zarzouram / image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
☆66Updated 2 years ago
LLaVA-Annonymous / LLaVA
☆41Updated 2 years ago
zjysteven / lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…
☆312Updated 5 months ago
RustamyF / clip-multimodal-ml
☆63Updated last year
2U1 / Molmo-Finetune
An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.
☆56Updated 3 months ago
ByungKwanLee / Full-Segment-Anything
This is Pytorch Implementation Code for adding new features in code of Segment-Anything. Here, the features support batch-input on the fu…
☆156Updated last year
PengtaoJiang / Segment-Anything-CLIP
Connecting segment-anything's output masks with the CLIP model; Awesome-Segment-Anything-Works
☆197Updated 9 months ago
jchenghu / ExpansionNet_v2
Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
☆92Updated 7 months ago
mbzuai-oryx / groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…
☆897Updated last month
kyegomez / PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
☆91Updated last year
microsoft / UniCL
[CVPR 2022] Official code for "Unified Contrastive Learning in Image-Text-Label Space"
☆402Updated last year
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆144Updated this week
amazon-science / QA-ViT
☆69Updated last year