eduardorochasoares / easytopic
A pipeline architecture for temporal segmentation of video lectures.
☆11Updated 4 years ago
Alternatives and similar repositories for easytopic:
Users that are interested in easytopic are comparing it to the libraries listed below
- ☆22Updated 3 years ago
- Official repository of the paper "Unsupervised Audio-Visual Lecture Segmentation", WACV 2023☆12Updated last month
- Vision-Language Pre-Training for Boosting Scene Text Detectors (CVPR2022)☆12Updated 3 years ago
- Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Lo…☆39Updated last year
- A collection of videos annotated with timelines where each video is divided into segments, and each segment is labelled with a short free…☆25Updated 3 years ago
- Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)☆27Updated 2 years ago
- M-VAD Names Dataset. Multimedia Tools and Applications (2019)☆20Updated 5 years ago
- ☆44Updated 3 years ago
- Use CLIP to represent video for Retrieval Task☆69Updated 4 years ago
- ☆11Updated 4 years ago
- Data and code for CVPR 2020 paper: "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"☆160Updated 4 years ago
- This repo is used for downloading the videos for SVD dataset.☆18Updated 4 years ago
- SpeechYOLO Interspeech 2019☆43Updated 2 years ago
- ☆40Updated last year
- A bilingual dataset for image captioning☆17Updated 4 years ago
- Dense video captioning in PyTorch☆41Updated 5 years ago
- VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)☆54Updated 3 weeks ago
- A one-stop shop for YouCook2 info such as leaderboard and recent advances on (cooking) video retrieval and captioning.☆40Updated 2 years ago
- 🎁 A Large-scale Multi-modal E-Commerce Products Dataset (LTDL@IJCAI-21 Best Dataset & Pattern Recognition 2023)☆29Updated last year
- Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020☆80Updated 4 years ago
- Multi-sense word embeddings from visual co-occurrences☆25Updated 5 years ago
- Multitask Multilingual Multimodal Pre-training☆71Updated 2 years ago
- Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19☆32Updated 5 years ago
- Code, Models and Datasets for OpenViDial Dataset☆131Updated 3 years ago
- Code and Resources for the Transformer Encoder Reasoning Network (TERN) - https://arxiv.org/abs/2004.09144☆58Updated last year
- Feature Re-Learning with Data Augmentation for Video Relevance Prediction☆20Updated 2 years ago
- A collection of models for image<->text generation in ACM MM 2021.☆66Updated 3 years ago
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆51Updated 3 years ago
- Procedural Reasoning Networks☆7Updated 4 years ago
- ☆53Updated 3 years ago