alvinbhou / Video2TextLinks
📺 An Encoder-Decoder Model for Sequence-to-Sequence learning: Video to Text
☆25Updated 7 years ago
Alternatives and similar repositories for Video2Text
Users that are interested in Video2Text are comparing it to the libraries listed below
Sorting:
- 这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。 视频描述生成任务指的是:输入一个视频,输出一句描述整个视频内容的文字(前提是视频较短且可以用一句话来描述)。本repo主要目的是帮助视力障碍…☆99Updated 3 years ago
- Make video classification on UCF101 using CNN and RNN based on Pytorch framework.☆64Updated 2 years ago
- Video Captioning is an encoder decoder mode based on sequence to sequence learning☆140Updated last year
- Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and T…☆643Updated last year
- Video to Text: Natural language description generator for some given video. [Video Captioning]☆359Updated 3 years ago
- [AAAI 2020] Official implementation of VAANet for Emotion Recognition☆83Updated 2 years ago
- Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.☆73Updated 4 years ago
- CNN LSTM architecture implemented in Pytorch for Video Classification☆301Updated 3 years ago
- ☆50Updated 3 years ago
- ☆16Updated last year
- Transformer & CNN Image Captioning model in PyTorch.☆43Updated 2 years ago
- ☆75Updated 4 years ago
- Code release for ActionFormer (ECCV 2022)☆537Updated last year
- PyTorch implementation of Emotic CNN methodology to recognize emotions in images using context information.☆148Updated 2 years ago
- Implementation of ViViT: A Video Vision Transformer☆556Updated 4 years ago
- Using VideoBERT to tackle video prediction☆134Updated 4 years ago
- A jupyter notebook showing how to finetune the vision transformer on a facial expression dataset (FER-2013)☆35Updated 4 years ago
- A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.☆74Updated 2 years ago
- Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)☆230Updated 2 years ago
- Implemented 3 different architectures to tackle the Image Caption problem, i.e, Merged Encoder-Decoder - Bahdanau Attention - Transformer…☆40Updated 4 years ago
- [ICASSP 2023] Official Implementation of ViTASD: Robust Vision Transformer Baselines for Autism Spectrum Disorder Facial Diagnosis☆29Updated 2 years ago
- Abnormal Human Behaviors Detection/ Road Accident Detection From Surveillance Videos/ Real-World Anomaly Detection in Surveillance Videos…☆169Updated 3 years ago
- Implementation of "Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network"☆36Updated 6 years ago
- ☆69Updated 4 years ago
- This is the repository for MMASD: A Multimodal Dataset for Autism Intervention Analysis.☆39Updated 2 years ago
- ☆16Updated 5 years ago
- ☆80Updated 6 years ago
- PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.☆87Updated 4 years ago
- A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization" (IEEE IS…☆91Updated 3 years ago
- ☆147Updated 3 years ago