enrico310786 / image_text_retrieval_BLIP_BLIP2Links
Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models
☆15Updated 2 years ago
Alternatives and similar repositories for image_text_retrieval_BLIP_BLIP2
Users that are interested in image_text_retrieval_BLIP_BLIP2 are comparing it to the libraries listed below
Sorting:
- Research Code for Multimodal-Cognition Team in Ant Group☆169Updated 2 weeks ago
- The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".☆253Updated last year
- ☆79Updated last year
- ☆30Updated last year
- ☆141Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆145Updated last week
- ☆87Updated last year
- AAAI 2024: Visual Instruction Generation and Correction☆93Updated last year
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆86Updated last year
- Toward Universal Multimodal Embedding☆64Updated 3 months ago
- ☆186Updated last year
- FInetuning CLIP for Few Shot Learning☆46Updated 3 years ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆210Updated last year
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆294Updated last year
- ☆48Updated 6 months ago
- This is the official repository for Retrieval Augmented Visual Question Answering☆238Updated 10 months ago
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆191Updated last year
- InstructionGPT-4☆42Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆275Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆100Updated 5 months ago
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆229Updated 5 months ago
- LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer☆389Updated this week
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101Updated last year
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…☆543Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆152Updated 2 months ago
- ☆21Updated 2 weeks ago
- Building a VLM model starts from the basic module.☆18Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆58Updated 5 months ago
- ☆186Updated 8 months ago
- ☆376Updated 8 months ago