jwehrmann / lmtd
Labeled Movie Trailer Dataset
☆16Updated 7 years ago
Alternatives and similar repositories for lmtd
Users that are interested in lmtd are comparing it to the libraries listed below
Sorting:
- ☆37Updated 3 years ago
- Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"☆21Updated 3 years ago
- Listen to Look: Action Recognition by Previewing Audio (CVPR 2020)☆129Updated 3 years ago
- A repository for extract CNN features from videos using pytorch☆69Updated 2 years ago
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆51Updated 3 years ago
- Self-Supervised Learning by Cross-Modal Audio-Video Clustering (NeurIPS 2020)☆90Updated 2 years ago
- Audio Visual Instance Discrimination with Cross-Modal Agreement☆129Updated 3 years ago
- ☆22Updated last year
- A Video Summarization framework for implementation and benchmark of Deep Learning models☆34Updated 8 months ago
- ☆35Updated 6 years ago
- ☆32Updated 6 years ago
- An implementation of the paper "Contextualize, Show and Tell: A Neural Visual Storyteller." presented at the Storytelling Workshop, co-lo…☆33Updated 6 years ago
- Code for the paper: Audio-Visual Model Distillation Using Acoustic Images☆21Updated 2 years ago
- Content-Based Video-Music Retrieval using Soft Intra-Modal Structure Constraint☆61Updated 7 years ago
- ☆21Updated 5 years ago
- Using an LSTM and 4d convolutional network for lip reading☆12Updated 7 years ago
- Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval." CVPR 2022☆106Updated 2 years ago
- A Dataset for Grounded Video Description☆162Updated 3 years ago
- Pytorch implementation of audio-visual fusion video captioning model☆26Updated 6 years ago
- Tools for loading video dataset and transforms on video in pytorch. You can directly load video files without preprocessing.☆69Updated 2 years ago
- Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"☆113Updated 4 years ago
- Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018☆184Updated 4 years ago
- [CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning☆90Updated last year
- Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)☆27Updated 2 years ago
- A length-controllable and non-autoregressive image captioning model.☆68Updated 3 years ago
- A dataset of debunked and verified user-generated videos.☆30Updated 6 years ago
- A unified framework to jointly model images, text, and human attention traces.☆78Updated 3 years ago
- Easy to use video deep features extractor☆315Updated 4 years ago
- Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)☆26Updated 5 years ago
- Cross-model active contrastive coding☆22Updated 4 years ago