enrico310786 / image_text_retrieval_BLIP_BLIP2Links

Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models

☆15

Alternatives and similar repositories for image_text_retrieval_BLIP_BLIP2

Users that are interested in image_text_retrieval_BLIP_BLIP2 are comparing it to the libraries listed below

Sorting:

alipay / Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
☆169Updated 2 weeks ago
NExT-ChatV / NExT-Chat
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
☆253Updated last year
xverse-ai / XVERSE-V-13B
☆79Updated last year
zhangfaen / finetune-InternVL2
☆30Updated last year
LukeForeverYoung / UReader
☆141Updated last year
kyegomez / PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆145Updated last week
scenarios / WeMM
☆87Updated last year
opendatalab / VIGC
AAAI 2024: Visual Instruction Generation and Correction
☆93Updated last year
Ucas-HaoranWei / Vary-tiny-600k
Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)
☆86Updated last year
BIGBALLON / UME-Search
Toward Universal Multimodal Embedding
☆64Updated 3 months ago
large-ocr-model / large-ocr-model.github.io
☆186Updated last year
b-hahn / CLIP
FInetuning CLIP for Few Shot Learning
☆46Updated 3 years ago
Beckschen / ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
☆210Updated last year
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆294Updated last year
matthewchung74 / qwen_2_5_3B_GRPO_medical_thinking
☆48Updated 6 months ago
LinWeizheDragon / Retrieval-Augmented-Visual-Question-Answering
This is the official repository for Retrieval Augmented Visual Question Answering
☆238Updated 10 months ago
ksOAn6g5 / TaiSu
TaiSu（太素）--a large-scale Chinese multimodal dataset（亿级大规模中文视觉语言预训练数据集）
☆191Updated last year
waltonfuture / InstructionGPT-4
InstructionGPT-4
☆42Updated last year
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆275Updated last year
LinWeizheDragon / FLMR
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
☆100Updated 5 months ago
VectorSpaceLab / MegaPairs
[ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
☆229Updated 5 months ago
thunlp / LLaVA-UHD
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
☆389Updated this week
360CVGroup / SEEChat
Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM
☆101Updated last year
OpenGVLab / Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…
☆543Updated last year
JiuTian-VL / JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
☆152Updated 2 months ago
QQ-MM / QQMM-embed
☆21Updated 2 weeks ago
WatchTower-Liu / VLM-learning
Building a VLM model starts from the basic module.
☆18Updated last year
jinbo0906 / Awesome-MLLM-Datasets
This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …
☆58Updated 5 months ago
WePOINTS / WePOINTS
☆186Updated 8 months ago
zhangfaen / finetune-Qwen2-VL
☆376Updated 8 months ago