Eurus-Holmes/Awesome-Multimodal-Research

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Eurus-Holmes/Awesome-Multimodal-Research)

Eurus-Holmes / Awesome-Multimodal-Research

A curated list of Multimodal Related Research.

☆1,393

Alternatives and similar repositories for Awesome-Multimodal-Research

Users that are interested in Awesome-Multimodal-Research are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pliang279 / awesome-multimodal-ml
View on GitHub
Reading list for research topics in multimodal machine learning
☆6,913Aug 20, 2024Updated last year
yuewang-cuhk / awesome-vision-language-pretraining-papers
View on GitHub
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
☆1,159Aug 19, 2022Updated 3 years ago
jayleicn / ClipBERT
View on GitHub
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…
☆730Aug 8, 2023Updated 2 years ago
BradyFU / Awesome-Multimodal-Large-Language-Models
View on GitHub
Latest Advances on Multimodal Large Language Models
☆17,958Updated this week
yaohungt / Multimodal-Transformer
View on GitHub
[ACL'19] [PyTorch] Multimodal Transformer
☆993Sep 12, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sangminwoo / awesome-vision-and-language
View on GitHub
A curated list of awesome vision and language resources (still under construction... stay tuned!)
☆562Nov 4, 2024Updated last year
microsoft / Oscar
View on GitHub
Oscar and VinVL
☆1,054Aug 28, 2023Updated 2 years ago
forence / Awesome-Visual-Captioning
View on GitHub
This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP
☆410Nov 14, 2022Updated 3 years ago
facebookresearch / mmf
View on GitHub
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
☆5,636Jul 7, 2026Updated 3 weeks ago
zdou0830 / METER
View on GitHub
METER: A Multimodal End-to-end TransformER Framework
☆377Nov 16, 2022Updated 3 years ago
Eurus-Holmes / MNMT
View on GitHub
Pytorch implementation of Multimodal Neural Machine Translation(MNMT).
☆13Jan 21, 2021Updated 5 years ago
declare-lab / multimodal-deep-learning
View on GitHub
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as mul…
☆922Mar 15, 2023Updated 3 years ago
jokieleung / awesome-visual-question-answering
View on GitHub
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Common…
☆672Jul 6, 2023Updated 3 years ago
salesforce / ALBEF
View on GitHub
Code for ALBEF: a new vision-language pre-training method
☆1,755Sep 20, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
facebookresearch / multimodal
View on GitHub
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
☆1,728Updated this week
dandelin / ViLT
View on GitHub
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
☆1,538Apr 3, 2024Updated 2 years ago
UKPLab / MMT-Retrieval
View on GitHub
☆131Dec 10, 2022Updated 3 years ago
jason718 / awesome-self-supervised-learning
View on GitHub
A curated list of awesome self-supervised methods
☆6,406Feb 24, 2026Updated 5 months ago
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,257Jun 2, 2026Updated last month
Eurus-Holmes / CMU11-785
View on GitHub
11-785 Introduction to Deep Learning Fall 2018
☆39Jul 20, 2019Updated 7 years ago
airsplay / lxmert
View on GitHub
PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
☆965Oct 22, 2022Updated 3 years ago
DirtyHarryLYL / Transformer-in-Vision
View on GitHub
Recent Transformer-based CV and related works.
☆1,344Aug 22, 2023Updated 2 years ago
ShannonAI / OpenViDial
View on GitHub
Code, Models and Datasets for OpenViDial Dataset
☆133Jan 22, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
danieljf24 / awesome-video-text-retrieval
View on GitHub
A curated list of deep learning resources for video-text retrieval.
☆644Oct 20, 2023Updated 2 years ago
ttengwang / Awesome_Prompting_Papers_in_Computer_Vision
View on GitHub
A curated list of prompt-based paper in computer vision and vision-language learning.
☆927Dec 18, 2023Updated 2 years ago
ChenRocks / UNITER
View on GitHub
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
☆799Jun 30, 2021Updated 5 years ago
thunlp / PromptPapers
View on GitHub
Must-read papers on prompt-based tuning for pre-trained language models.
☆4,324Jul 17, 2023Updated 3 years ago
Yutong-Zhou-cv / Awesome-Multimodality
View on GitHub
A Survey on multimodal learning research.
☆332Aug 22, 2023Updated 2 years ago
yzhuoning / Awesome-CLIP
View on GitHub
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,229Jun 28, 2024Updated 2 years ago
dk-liang / Awesome-Visual-Transformer
View on GitHub
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
☆3,589Jan 7, 2025Updated last year
jackroos / VL-BERT
View on GitHub
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
☆742May 22, 2023Updated 3 years ago
HobbitLong / PyContrast
View on GitHub
PyTorch implementation of Contrastive Learning methods
☆1,997Oct 4, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ZihengZZH / awesome-multimodal-knowledge-graph
View on GitHub
A curated list of AWESOME papers, datasets and tutorials within Multimodal Knowledge Graph.
☆397Apr 15, 2025Updated last year
Separius / awesome-fast-attention
View on GitHub
list of efficient attention modules
☆1,022Aug 23, 2021Updated 4 years ago
linjieli222 / HERO
View on GitHub
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
☆235Sep 16, 2021Updated 4 years ago
asheeshcric / awesome-contrastive-self-supervised-learning
View on GitHub
A comprehensive list of awesome contrastive self-supervised learning papers.
☆1,310Sep 10, 2024Updated last year
TheShadow29 / awesome-grounding
View on GitHub
awesome grounding: A curated list of research papers in visual grounding
☆1,126Sep 21, 2025Updated 10 months ago
soujanyaporia / multimodal-sentiment-analysis
View on GitHub
Attention-based multimodal fusion for sentiment analysis
☆367Apr 8, 2024Updated 2 years ago
openai / CLIP
View on GitHub
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
☆34,083Mar 25, 2026Updated 4 months ago