samsad35 / VQ-MAE-S-codeLinks
A Vector Quantized Masked AutoEncoder for speech emotion recognition
☆24Updated last year
Alternatives and similar repositories for VQ-MAE-S-code
Users that are interested in VQ-MAE-S-code are comparing it to the libraries listed below
Sorting:
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆161Updated last month
- Official implement of SpeechFormer written in Python (PyTorch).☆80Updated 2 years ago
- A Compact and Effective Pretrained Model for Speech Emotion Recognition☆41Updated 11 months ago
- ☆166Updated 11 months ago
- ☆18Updated last year
- ☆75Updated last month
- SpeechFormer++ in PyTorch☆48Updated last year
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation☆37Updated 9 months ago
- Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition☆150Updated 3 years ago
- [ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations☆37Updated last year
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆40Updated 8 months ago
- [ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition☆24Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆156Updated 6 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆43Updated 2 months ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆154Updated 2 months ago
- Pytorch implementation for “V2C: Visual Voice Cloning”☆32Updated 2 years ago
- [CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.☆106Updated last year
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆53Updated last year
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆31Updated 3 months ago
- [SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model☆122Updated 8 months ago
- EMO-SUPERB submission☆44Updated 9 months ago
- ☆13Updated 11 months ago
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆144Updated 7 months ago
- Deformable Speech Transformer (DST)☆32Updated 10 months ago
- PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models (…☆60Updated 11 months ago
- Source code for the paper 'Audio Captioning Transformer'☆53Updated 3 years ago
- 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition.☆40Updated 4 years ago
- Code for Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information☆145Updated last year
- [ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"☆85Updated 7 months ago
- A list of papers (with available code), tutorials, and surveys on recent AI for emotion recognition (AI4ER)☆19Updated last year