jwehrmann / lmtdLinks
Labeled Movie Trailer Dataset
☆16Updated 7 years ago
Alternatives and similar repositories for lmtd
Users that are interested in lmtd are comparing it to the libraries listed below
Sorting:
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆52Updated 3 years ago
- Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"☆21Updated 4 years ago
- A repository for extract CNN features from videos using pytorch☆69Updated 2 years ago
- Listen to Look: Action Recognition by Previewing Audio (CVPR 2020)☆130Updated 3 years ago
- ☆22Updated last year
- Pytorch implementation of audio-visual fusion video captioning model☆27Updated 6 years ago
- Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)☆27Updated 5 years ago
- Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.☆73Updated 4 years ago
- ☆23Updated 3 years ago
- PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning☆88Updated 3 years ago
- menovideo: pytorch library for video action recognition and video understanding☆29Updated 3 years ago
- Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"☆113Updated 4 years ago
- PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)☆125Updated 2 years ago
- A non-JIT version implementation / replication of CLIP of OpenAI in pytorch☆34Updated 4 years ago
- ☆31Updated 4 years ago
- [ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset☆90Updated last year
- Use CLIP to represent video for Retrieval Task☆69Updated 4 years ago
- ☆31Updated 3 years ago
- Audio Visual Instance Discrimination with Cross-Modal Agreement☆129Updated 3 years ago
- PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)☆143Updated 2 years ago
- Self-Supervised Learning by Cross-Modal Audio-Video Clustering (NeurIPS 2020)☆90Updated 2 years ago
- Deep Audio-Visual Embedding network (DAVEnet) implementation in PyTorch☆65Updated 6 years ago
- M-VAD Names Dataset. Multimedia Tools and Applications (2019)☆20Updated 5 years ago
- MDMMT: Multidomain Multimodal Transformer for Video Retrieval☆26Updated 3 years ago
- Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*☆31Updated 4 years ago
- Character Grounding and Re-Identification in Story of Videos and Text Descriptions☆10Updated 4 years ago
- The Evoked Expressions in Video dataset contains videos paired with the expected facial expressions over time exhibited by people reactin…☆38Updated 3 years ago
- Localized Narratives☆84Updated 3 years ago
- A curated list of the Video Summarization subject which is a computer science using machine learning and deep learning☆42Updated 5 years ago
- ☆37Updated 3 years ago