pliang279 / awesome-multimodal-mlView external linksLinks
Reading list for research topics in multimodal machine learning
☆6,814Aug 20, 2024Updated last year
Alternatives and similar repositories for awesome-multimodal-ml
Users that are interested in awesome-multimodal-ml are comparing it to the libraries listed below
Sorting:
- A curated list of Multimodal Related Research.☆1,388Aug 5, 2023Updated 2 years ago
- Recent Advances in Vision and Language PreTrained Models (VL-PTMs)☆1,155Aug 19, 2022Updated 3 years ago
- Latest Advances on Multimodal Large Language Models☆17,337Feb 7, 2026Updated last week
- A curated list of awesome self-supervised methods☆6,361Jul 3, 2024Updated last year
- A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)☆5,616Jan 12, 2026Updated last month
- TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.☆1,699Updated this week
- [ACL'19] [PyTorch] Multimodal Transformer☆958Sep 12, 2022Updated 3 years ago
- awesome grounding: A curated list of research papers in visual grounding☆1,125Sep 21, 2025Updated 4 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆11,166Nov 18, 2024Updated last year
- The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights --…☆36,351Updated this week
- PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".☆966Oct 22, 2022Updated 3 years ago
- Code for ALBEF: a new vision-language pre-training method☆1,752Sep 20, 2022Updated 3 years ago
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities☆22,021Jan 23, 2026Updated 3 weeks ago
- CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image☆32,562Jul 23, 2024Updated last year
- Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Py…☆24,993Updated this week
- A collection of resources and papers on Diffusion Models☆12,273Aug 1, 2024Updated last year
- Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".☆746May 22, 2023Updated 2 years ago
- This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as mul…☆905Mar 15, 2023Updated 2 years ago
- Oscar and VinVL☆1,052Aug 28, 2023Updated 2 years ago
- [NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning☆613Jan 27, 2024Updated 2 years ago
- Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"☆800Jun 30, 2021Updated 4 years ago
- A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Common…☆671Jul 6, 2023Updated 2 years ago
- Must-read papers on graph neural networks (GNN)☆16,723Dec 20, 2023Updated 2 years ago
- Multi Task Vision and Language☆825Feb 16, 2022Updated 4 years ago
- A curated list of awesome vision and language resources (still under construction... stay tuned!)☆559Nov 4, 2024Updated last year
- A collection of AWESOME things about domain adaptation☆5,402Dec 8, 2025Updated 2 months ago
- PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722☆5,116Feb 3, 2026Updated last week
- [CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…☆723Aug 8, 2023Updated 2 years ago
- An open source implementation of CLIP.☆13,383Updated this week
- Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"☆1,524Apr 3, 2024Updated last year
- PyTorch implementation of Contrastive Learning methods☆1,996Oct 4, 2023Updated 2 years ago
- Must-read papers on prompt-based tuning for pre-trained language models.☆4,297Jul 17, 2023Updated 2 years ago
- Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)☆3,567Jan 7, 2025Updated last year
- Google Research☆37,261Updated this week
- A Survey on multimodal learning research.☆333Aug 22, 2023Updated 2 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆32,153Sep 30, 2025Updated 4 months ago
- 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal model…☆156,440Updated this week
- Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials…☆14,273Feb 18, 2025Updated 11 months ago
- A comprehensive list of awesome contrastive self-supervised learning papers.☆1,309Sep 10, 2024Updated last year