IsaacRodgz / multimodal-transformers-movies
Experiments with multimodal deep learning models based on transformers
☆12Updated 2 years ago
Alternatives and similar repositories for multimodal-transformers-movies:
Users that are interested in multimodal-transformers-movies are comparing it to the libraries listed below
- ☆16Updated 4 years ago
- Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.☆73Updated 3 years ago
- Source code of our MM'22 paper Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning☆13Updated last year
- EACL 2023 paper "MLASK: Multimodal Summarization of Video-based News Articles"☆10Updated last year
- ☆12Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 3 months ago
- 一个近几年来各大视觉顶会关于视频文本检索的库,同步我的博客:https://blog.csdn.net/AAliuxiaolei/article/details/121433833☆14Updated 3 years ago
- The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)☆74Updated 2 years ago
- Multimodal classification solution for the SIGIR eCOM using Co-attention and transformer language models☆19Updated 4 years ago
- Multimodal short video classification task, integrating video, image, audio and text modes for short video classification☆19Updated 5 years ago
- Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)☆27Updated 2 years ago
- [ACM ICMR'25]Official repository for "eMotions: A Large-Scale Dataset for Emotion Recognition in Short Videos"☆33Updated 10 months ago
- Code and dataset of "MEmoR: A Dataset for Multimodal Emotion Reasoning in Videos" in MM'20.☆53Updated last year
- Multi-modal Multi-label Emotion Recognition with Heterogeneous Hierarchical Message Passing☆16Updated 2 years ago
- Pytorch implementation for Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition☆60Updated 2 years ago
- [ACM MM 2021 Oral] Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation"☆40Updated 3 years ago
- PyTorch implementation of HANet: Hierarchical Alignment Networks for Video-Text Retrieval (ACM MM 2021).☆47Updated 3 years ago
- PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning☆87Updated 3 years ago
- GPT-4V with Emotion☆91Updated last year
- Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)☆26Updated 5 years ago
- CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis(MM2020)☆112Updated 4 years ago
- ☆55Updated 2 years ago
- EDUVSUM is a multimodal neural architecture that utilizes state-of-the-art audio, visual and textual features to identify important tempo…☆21Updated last year
- Official Repository of "Multimodal Fusion Based Attentive Networks for Sequential Music Recommendation" accepted in BIGMM 2021☆14Updated 2 years ago
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆51Updated 3 years ago
- FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition☆28Updated 4 months ago
- Graph learning framework for long-term video understanding☆60Updated 2 months ago
- MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)☆33Updated last year
- Source code of our MM'22 paper Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning☆21Updated 10 months ago
- ☆19Updated 3 years ago