abikaki / awesome-speech-emotion-recognitionLinks
π Awesome lists about Speech Emotion Recognition
β100Updated last year
Alternatives and similar repositories for awesome-speech-emotion-recognition
Users that are interested in awesome-speech-emotion-recognition are comparing it to the libraries listed below
Sorting:
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translationβ198Updated 6 months ago
- β70Updated last year
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmarkβ307Updated 10 months ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Unitsβ47Updated last year
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vecβ¦β117Updated 4 months ago
- Official Code implementation for the ICLR paper "LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading"β85Updated last year
- EMO-SUPERB submissionβ50Updated 3 months ago
- β121Updated 3 years ago
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.β58Updated 2 years ago
- This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey".β205Updated last week
- [INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for β¦β170Updated 8 months ago
- [Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronizationβ60Updated 10 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.β93Updated 2 years ago
- A Compact and Effective Pretrained Model for Speech Emotion Recognitionβ53Updated last year
- A collection of datasets for the purpose of emotion recognition/detection in speech.β399Updated last year
- β96Updated this week
- Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictionsβ266Updated last year
- This is the audio sample repository for speech separation model "MossFormer2".β170Updated last year
- Paper, Code and Statistics for Self-Supervised Learning and Pre-Training on Speech.β211Updated 2 years ago
- An implementation of Speech Emotion Recognition, based on HuBERT model, training with PyTorch and HuggingFace framework, and fine-tuning β¦β33Updated 3 years ago
- β176Updated last year
- PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)β70Updated last year
- This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingfacβ¦β127Updated last year
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordingsβ¦β104Updated last year
- Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Vβ¦β243Updated last year
- Dataset and baseline code for the VocalSound dataset (ICASSP2022).β157Updated 3 years ago
- FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3β234Updated last year
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representationsβ184Updated last year
- VoiceLDM: Text-to-Speech with Environmental Contextβ192Updated last year
- Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three differenβ¦β257Updated 3 years ago