enrico310786 / image_text_retrieval_BLIP_BLIP2Links
Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models
☆15Updated 2 years ago
Alternatives and similar repositories for image_text_retrieval_BLIP_BLIP2
Users that are interested in image_text_retrieval_BLIP_BLIP2 are comparing it to the libraries listed below
Sorting:
- Research Code for Multimodal-Cognition Team in Ant Group☆169Updated 2 months ago
- The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".☆255Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆103Updated 6 months ago
- ☆79Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆146Updated last month
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆154Updated 3 months ago
- mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)☆97Updated 2 years ago
- Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.☆270Updated 2 years ago
- AAAI 2024: Visual Instruction Generation and Correction☆94Updated last year
- ☆47Updated 7 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆145Updated 10 months ago
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆299Updated last year
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101Updated last year
- mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)☆228Updated 2 years ago
- This is the official repository for Retrieval Augmented Visual Question Answering☆242Updated 11 months ago
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"☆523Updated last year
- ☆72Updated 2 years ago
- VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)☆194Updated 2 years ago
- [ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆96Updated last week
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆300Updated last year
- Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".☆277Updated 6 months ago
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)☆166Updated last year
- ☆215Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆277Updated last year
- Building a VLM model starts from the basic module.☆18Updated last year
- 基于baichuan-7b的开源多模态大语言模型☆72Updated 2 years ago
- Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…☆40Updated 2 years ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆92Updated last year
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆238Updated last month
- transformers结构的中文OFA模型☆136Updated 2 years ago