jwehrmann / lmtdLinks
Labeled Movie Trailer Dataset
☆16Updated 7 years ago
Alternatives and similar repositories for lmtd
Users that are interested in lmtd are comparing it to the libraries listed below
Sorting:
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆51Updated 3 years ago
- Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"☆21Updated 4 years ago
- ☆22Updated last year
- Sapsucker Woods 60 Audiovisual Dataset☆15Updated 2 years ago
- ☆23Updated 3 years ago
- A repository for extract CNN features from videos using pytorch☆69Updated 2 years ago
- ☆26Updated 3 years ago
- Using an LSTM and 4d convolutional network for lip reading☆12Updated 7 years ago
- [CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)☆60Updated 3 years ago
- ☆55Updated 2 years ago
- [CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning☆90Updated last year
- EDUVSUM is a multimodal neural architecture that utilizes state-of-the-art audio, visual and textual features to identify important tempo…☆21Updated last year
- In-the-wild Question Answering☆15Updated 2 years ago
- A unified framework to jointly model images, text, and human attention traces.☆78Updated 4 years ago
- multimodal video-audio-text generation and retrieval between every pair of modalities on the MUGEN dataset. The repo. contains the traini…☆40Updated 2 years ago
- Localized Narratives☆82Updated 3 years ago
- Implementations of Transformers for Video☆23Updated 4 years ago
- Use CLIP to represent video for Retrieval Task☆69Updated 4 years ago
- PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "☆39Updated 4 years ago
- ☆44Updated 3 years ago
- M-VAD Names Dataset. Multimedia Tools and Applications (2019)☆20Updated 5 years ago
- 12-in-1: Multi-Task Vision and Language Representation Learning Web Demo☆35Updated 2 years ago
- Using a CNN-LSTM hybrid network to generate captions for images☆17Updated 5 years ago
- Screenplay Summarization using Latent Narrative Structure☆38Updated 2 years ago
- ☆32Updated 2 years ago
- AViD Dataset: Anonymized Videos from Diverse Countries☆56Updated 2 years ago
- Repository for Multilingual-VQA task created during HuggingFace JAX/Flax community week.☆34Updated 3 years ago
- Benchmark data and code for Question-Answering on Movie stories☆44Updated 5 years ago
- [ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset☆90Updated last year
- Self-Supervised Learning by Cross-Modal Audio-Video Clustering (NeurIPS 2020)☆90Updated 2 years ago