The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
☆234Aug 27, 2022Updated 3 years ago
Alternatives and similar repositories for Multi-Modal-Transformer
Users that are interested in Multi-Modal-Transformer are comparing it to the libraries listed below
Sorting:
- A curated list of Survey Papers on Deep Learning.☆11Sep 5, 2023Updated 2 years ago
- A curated resources on what's happening in multimodal learning. Features recent papers, books, related lectures, and other relevant resou…☆16Apr 28, 2023Updated 2 years ago
- Recent Transformer-based CV and related works.☆1,340Aug 22, 2023Updated 2 years ago
- Course repository for the Spring 2023 COMP664 course "Deep Learning" at UNC☆14Apr 17, 2023Updated 2 years ago
- Reading list for research topics in Masked Image Modeling☆338Dec 3, 2024Updated last year
- Keras implementation of "DFNet: Discriminative feature extraction and integration network for salient object detection"☆23Jan 5, 2021Updated 5 years ago
- Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).☆1,232Jun 28, 2024Updated last year
- LoMaR (Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction)☆66Apr 3, 2025Updated 11 months ago
- Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"☆1,527Apr 3, 2024Updated last year
- A curated list of deep learning resources for video-text retrieval.☆643Oct 20, 2023Updated 2 years ago
- Codebase for CVPR 2020 paper "Spatio-Temporal Graph for Video Captioning with Knowledge Distillation"☆23Mar 4, 2020Updated 6 years ago
- TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.☆1,699Feb 23, 2026Updated last week
- Recent Advances in Vision and Language PreTrained Models (VL-PTMs)☆1,155Aug 19, 2022Updated 3 years ago
- An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites☆5,016Jul 30, 2024Updated last year
- Multi-Modal Transformer for Video Retrieval☆265Oct 9, 2024Updated last year
- Recent Advances in Vision and Language Pre-training (VLP)☆295Jun 6, 2023Updated 2 years ago
- Reading list for research topics in multimodal machine learning☆6,824Aug 20, 2024Updated last year
- [CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…☆729Aug 8, 2023Updated 2 years ago
- ☆280Mar 22, 2021Updated 4 years ago
- A curated list of vision-and-language pre-training (VLP). :-)☆62Jul 6, 2022Updated 3 years ago
- Includes PyTorch -> Keras model porting code for DeiT models with fine-tuning and inference notebooks.☆41Apr 30, 2022Updated 3 years ago
- TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"☆37Dec 17, 2021Updated 4 years ago
- Align and Prompt: Video-and-Language Pre-training with Entity Prompts☆188May 1, 2025Updated 10 months ago
- A Survey on multimodal learning research.☆333Aug 22, 2023Updated 2 years ago
- [CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe☆145Feb 23, 2026Updated last week
- ☆11Apr 6, 2019Updated 6 years ago
- Directed masked autoencoders☆14Feb 20, 2026Updated 2 weeks ago
- ☆193Oct 22, 2022Updated 3 years ago
- A Survey on Transformer in CV.☆192Jun 18, 2023Updated 2 years ago
- A curated list of Multimodal Related Research.☆1,389Aug 5, 2023Updated 2 years ago
- Repository for research works and resources related to model reprogramming <https://arxiv.org/abs/2202.10629>☆64Sep 17, 2025Updated 5 months ago
- ☆40Jul 18, 2022Updated 3 years ago
- Coarse-to-Fine Curriculum Learning☆20Apr 28, 2020Updated 5 years ago
- ☆20Oct 3, 2022Updated 3 years ago
- Code release for SLIP Self-supervision meets Language-Image Pre-training☆787Feb 9, 2023Updated 3 years ago
- IBM Quantum Challenge Fall 2023☆10May 23, 2023Updated 2 years ago
- [WACV2023] This is the official PyTorch impelementation of our paper "[Rethinking Rotation in Self-Supervised Contrastive Learning: Adapt…☆12Feb 24, 2023Updated 3 years ago
- Implementation for NATv2.☆23Feb 20, 2021Updated 5 years ago
- Code of Decomposition and Completion Network for Salient Object Detection, TIP 2021.☆10Mar 30, 2023Updated 2 years ago