zhjohnchan / awesome-vision-and-language-pretraining
A curated list of vision-and-language pre-training (VLP). :-)
β57Updated 2 years ago
Alternatives and similar repositories for awesome-vision-and-language-pretraining:
Users that are interested in awesome-vision-and-language-pretraining are comparing it to the libraries listed below
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world dataβ39Updated last year
- This repo contains codes and instructions for baselines in the VLUE benchmark.β41Updated 2 years ago
- 𦩠Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)β63Updated last year
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learningβ38Updated last year
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioningβ35Updated 6 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".β32Updated last year
- Source code for the paper "Prefix Language Models are Unified Modal Learners"β43Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Modelsβ43Updated 8 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"β53Updated last year
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".β58Updated last year
- β84Updated 2 years ago
- Colorful Prompt Tuning for Pre-trained Vision-Language Modelsβ48Updated 2 years ago
- β47Updated last year
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISIONβ36Updated 2 years ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have bβ¦β71Updated last year
- ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integrationβ56Updated last year
- PyTorch code for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles (DANCE)β23Updated 2 years ago
- Implementation of the Benchmark Approaches for Medical Instructional Video Classification (MedVidCL) and Medical Video Question Answeringβ¦β28Updated 2 years ago
- β58Updated last year
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.β29Updated last year
- β87Updated last year
- Code for the paper "RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning" (EMNLP'23 Findings).β25Updated 9 months ago
- code for "Multitask Vision-Language Prompt Tuning" https://arxiv.org/abs/2211.11720β55Updated 8 months ago
- Source code for EMNLP 2022 paper βPEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Modelsββ48Updated 2 years ago
- CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignmentβ22Updated 2 years ago
- β101Updated 2 years ago
- implementation of paper https://arxiv.org/abs/2210.04559β53Updated 2 years ago
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuningβ45Updated 9 months ago
- Code and data for ImageCoDe, a contextual vison-and-language benchmarkβ39Updated 11 months ago
- Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))β89Updated last year