Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.
☆73Jun 7, 2021Updated 4 years ago
Alternatives and similar repositories for Multimodal-action-recognition
Users that are interested in Multimodal-action-recognition are comparing it to the libraries listed below
Sorting:
- Official Pytorch Implementation for Continual Learning For On-Device Environmental Sound Classification☆14Jul 19, 2022Updated 3 years ago
- Resources for: Cross-Lingual Disaster-related Multi-label Tweet Classification with Manifold Mixup (ACL SRW 2020)☆11Sep 9, 2021Updated 4 years ago
- collection of skeleton-based human action recognition☆10Jun 28, 2020Updated 5 years ago
- Generalized cross-modal NNs; new audiovisual benchmark (IEEE TNNLS 2019)☆31Apr 13, 2020Updated 5 years ago
- Pytorch implementation of DSR-RL for Video Summarization Task☆12Aug 30, 2021Updated 4 years ago
- Pretraining summarization models using a corpus of nonsense☆13Sep 28, 2021Updated 4 years ago
- (2020) Video Classification Neural Network☆30Feb 18, 2020Updated 6 years ago
- PLLay: Efficient Topological Layer based on Persistence Landscapes☆23Dec 10, 2020Updated 5 years ago
- Video Transformer Network☆41Jun 8, 2021Updated 4 years ago
- [CVPR 2022] Cross-Architecture Self-supervised Video Representation Learning☆24Jul 5, 2022Updated 3 years ago
- A Pytorch implementation of emotion recognition from videos☆18Sep 15, 2020Updated 5 years ago
- Self-Supervised Learning by Cross-Modal Audio-Video Clustering (NeurIPS 2020)☆91Oct 24, 2022Updated 3 years ago
- Fast Template Matching and Update for Video Object Tracking and Segmentation☆26Oct 7, 2021Updated 4 years ago
- Chinese text generation, now open source news and prose model and code☆24Jun 12, 2023Updated 2 years ago
- PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529☆163Jul 19, 2022Updated 3 years ago
- MDMMT: Multidomain Multimodal Transformer for Video Retrieval☆26Jun 28, 2021Updated 4 years ago
- video summarization lstm-gan pytorch implementation☆27Dec 6, 2019Updated 6 years ago
- ☆26Nov 30, 2019Updated 6 years ago
- ☆108Aug 24, 2022Updated 3 years ago
- An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"☆365Jul 25, 2024Updated last year
- Codebase for CVPR 2020 paper "Spatio-Temporal Graph for Video Captioning with Knowledge Distillation"☆23Mar 4, 2020Updated 6 years ago
- This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as mul…☆906Mar 15, 2023Updated 2 years ago
- Using Hotel Data to predict High Value And Potential VIP Guests☆12Dec 27, 2021Updated 4 years ago
- FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition☆33Nov 29, 2024Updated last year
- Video classification, youtube8m, Knowledge distillation, Tensorflow, NeXtVLAD☆27Sep 5, 2019Updated 6 years ago
- Multimodal deep quality embedding network (MMDQEN) for affective video content analysis. (MM'19, TAFFC'20)☆10Jul 24, 2021Updated 4 years ago
- KSSNet: Multi-Label Classification with Label Graph Superimposing☆60Mar 3, 2020Updated 6 years ago
- Multi-Modal Transformer for Video Retrieval☆265Oct 9, 2024Updated last year
- Codes for IJCAI2020 paper "Unsupervised Representation Learning by Predicting Random Distances” https://arxiv.org/abs/1912.12186☆29Apr 25, 2020Updated 5 years ago
- The source code of ACL 2020 paper: "Cross-Modality Relevance for Reasoning on Language and Vision"☆27May 6, 2021Updated 4 years ago
- EsViT: Efficient self-supervised Vision Transformers☆411Aug 28, 2023Updated 2 years ago
- Research code for "Training Vision-Language Transformers from Captions Alone"☆33Jul 15, 2022Updated 3 years ago
- [ACL'19] [PyTorch] Multimodal Transformer☆962Sep 12, 2022Updated 3 years ago
- ☆73Jun 3, 2022Updated 3 years ago
- Implementation of Cross-category Video Highlight Detection via Set-based Learning (ICCV 2021).☆79Aug 27, 2021Updated 4 years ago
- Pytorch implementation of Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs (Interspeech, 2020)☆75Sep 16, 2020Updated 5 years ago
- A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"☆84Feb 25, 2022Updated 4 years ago
- Code and data repository associated with the respiratory resistance sensitivity discrimination manuscript.☆10Nov 20, 2025Updated 3 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago