enrico310786 / image_text_retrieval_BLIP_BLIP2
Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models
☆13Updated last year
Related projects ⓘ
Alternatives and complementary repositories for image_text_retrieval_BLIP_BLIP2
- Research Code for Multimodal-Cognition Team in Ant Group☆123Updated 4 months ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆122Updated 4 months ago
- [CVPR 2023 Workshop] The code reproduce the results of our solutions on both tracks for Meta AI Video Similarity Challenge (CVPR 2023 Wor…☆48Updated last year
- transformers结构的中文OFA模型☆123Updated last year
- ☆66Updated last year
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆77Updated last week
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated 10 months ago
- The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".☆221Updated 9 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆43Updated 10 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆69Updated 2 months ago
- ☆84Updated 4 months ago
- Multimodal chatbot with computer vision capabilities integrated☆99Updated 6 months ago
- mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)☆220Updated last year
- ☆166Updated 4 months ago
- Precision Search through Multi-Style Inputs☆54Updated 4 months ago
- ☆156Updated 8 months ago
- ☆30Updated 6 months ago
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆284Updated 10 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆142Updated 2 weeks ago
- TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision [AAAI2023 Oral]]☆53Updated last year
- Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".☆119Updated 2 weeks ago
- Chinese CLIP models with SOTA performance.☆48Updated last year
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆148Updated last month
- CLIP中文encoder☆21Updated 2 years ago
- [CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".☆227Updated 5 months ago
- ☆106Updated 9 months ago
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆49Updated 2 months ago
- ☆77Updated 6 months ago
- ☆231Updated last year