IsaacRodgz / multimodal-transformers-moviesLinks
Experiments with multimodal deep learning models based on transformers
☆12Updated 3 years ago
Alternatives and similar repositories for multimodal-transformers-movies
Users that are interested in multimodal-transformers-movies are comparing it to the libraries listed below
Sorting:
- Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.☆73Updated 4 years ago
- ☆31Updated 4 years ago
- ☆16Updated 5 years ago
- EACL 2023 paper "MLASK: Multimodal Summarization of Video-based News Articles"☆12Updated 2 years ago
- ☆58Updated last month
- Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval." CVPR 2022☆115Updated 3 years ago
- Graph learning framework for long-term video understanding☆71Updated 5 months ago
- PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning☆89Updated 4 years ago
- Using VideoBERT to tackle video prediction☆133Updated 4 years ago
- PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)☆124Updated 2 years ago
- [TMLR 2022] High-Modality Multimodal Transformer☆117Updated last year
- PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)☆144Updated 2 years ago
- The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)☆82Updated 2 years ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 11 months ago
- Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification☆135Updated 4 years ago
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆53Updated 3 years ago
- Official code repo for TCLR: Temporal Contrastive Learning for Video Representation [CVIU-2022]☆40Updated last year
- Easiest way of fine-tuning HuggingFace video classification models☆147Updated 2 years ago
- Official Implementation of "Geometric Multimodal Contrastive Representation Learning" (https://arxiv.org/abs/2202.03390)☆28Updated last year
- [ACM MM 2021 Oral] Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation"☆40Updated 4 years ago
- Multimodal short video classification task, integrating video, image, audio and text modes for short video classification☆19Updated 5 years ago
- Multi-modal transformer approach for natural language query based joint video summarization and highlight detection☆16Updated last year
- Code for the Video Similarity Challenge.☆81Updated last year
- Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)☆30Updated 5 years ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 4 years ago
- [CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos☆37Updated 11 months ago
- Video classification using the UCF101 dataset for action recognition. We extract SIFT, MFCC and STIP features from the videos, we encode …☆30Updated 5 years ago
- Video Summarization With Spatiotemporal Vision Transformer☆23Updated 2 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 3 years ago
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆51Updated 5 months ago