IsaacRodgz / multimodal-transformers-movies
Experiments with multimodal deep learning models based on transformers
☆12Updated 2 years ago
Alternatives and similar repositories for multimodal-transformers-movies:
Users that are interested in multimodal-transformers-movies are comparing it to the libraries listed below
- Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.☆71Updated 3 years ago
- ☆16Updated 4 years ago
- Multimodal classification solution for the SIGIR eCOM using Co-attention and transformer language models☆19Updated 4 years ago
- Source code of our MM'22 paper Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning☆13Updated last year
- Multimodal short video classification task, integrating video, image, audio and text modes for short video classification☆19Updated 5 years ago
- PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning☆86Updated 3 years ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 2 months ago
- Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval." CVPR 2022☆103Updated 2 years ago
- 一个近几年来各大视觉顶会关于视频文本检索的库,同步我的博客:https://blog.csdn.net/AAliuxiaolei/article/details/121433833☆14Updated 3 years ago
- EACL 2023 paper "MLASK: Multimodal Summarization of Video-based News Articles"☆10Updated last year
- Condensed Movies Challenge 2021☆19Updated 2 years ago
- Video Summarization With Spatiotemporal Vision Transformer☆22Updated last year
- Code and dataset of "MEmoR: A Dataset for Multimodal Emotion Reasoning in Videos" in MM'20.☆53Updated last year
- ☆31Updated 3 years ago
- Banchmark for personality traits prediction with neural networks☆53Updated 6 months ago
- Source code for the AAAI 2021 paper "Movie Summarization via Sparse Graph Construction"☆31Updated 4 years ago
- Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)☆25Updated 4 years ago
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆43Updated 11 months ago
- MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)☆86Updated last year
- ☆12Updated last year
- PyTorch implementation of HANet: Hierarchical Alignment Networks for Video-Text Retrieval (ACM MM 2021).☆47Updated 3 years ago
- ICASSP 2023: "Recursive Joint Attention for Audio-Visual Fusion in Regression Based Emotion Recognition"☆12Updated 4 months ago
- The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"☆19Updated last year
- EDUVSUM is a multimodal neural architecture that utilizes state-of-the-art audio, visual and textual features to identify important tempo…☆20Updated last year
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆27Updated 2 months ago
- FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition☆27Updated 4 months ago