enrico310786 / image_text_retrieval_BLIP_BLIP2Links
Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models
☆15Updated 2 years ago
Alternatives and similar repositories for image_text_retrieval_BLIP_BLIP2
Users that are interested in image_text_retrieval_BLIP_BLIP2 are comparing it to the libraries listed below
Sorting:
- Research Code for Multimodal-Cognition Team in Ant Group☆172Updated 3 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆105Updated 8 months ago
- The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".☆253Updated last year
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆146Updated 2 weeks ago
- ☆79Updated last year
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101Updated last year
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆302Updated 2 years ago
- ☆142Updated last year
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆90Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆153Updated 4 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆303Updated last year
- transformers结构的中文OFA模型☆138Updated 2 years ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆149Updated 11 months ago
- ☆88Updated last year
- Lion: Kindling Vision Intelligence within Large Language Models☆51Updated 2 years ago
- ☆47Updated 9 months ago
- ☆187Updated 11 months ago
- 支持中英文双语视觉-文本对话的开源可商用多模态模型。☆377Updated 2 years ago
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆86Updated last year
- mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)☆228Updated 2 years ago
- ☆31Updated last year
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆93Updated last year
- AAAI 2024: Visual Instruction Generation and Correction☆96Updated last year
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆46Updated last year
- 基于baichuan-7b的开源多模态大语言模型☆72Updated 2 years ago
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆241Updated 2 months ago
- ☆187Updated last year
- InstructionGPT-4☆42Updated 2 years ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆211Updated last year
- ☆72Updated 2 years ago