abikaki / awesome-speech-emotion-recognitionLinks
π Awesome lists about Speech Emotion Recognition
β96Updated 8 months ago
Alternatives and similar repositories for awesome-speech-emotion-recognition
Users that are interested in awesome-speech-emotion-recognition are comparing it to the libraries listed below
Sorting:
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translationβ176Updated last month
- β68Updated 11 months ago
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmarkβ270Updated 4 months ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Unitsβ42Updated 10 months ago
- β121Updated 2 years ago
- EMO-SUPERB submissionβ45Updated 11 months ago
- [Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronizationβ57Updated 5 months ago
- This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".β155Updated this week
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vecβ¦β109Updated 4 months ago
- A Compact and Effective Pretrained Model for Speech Emotion Recognitionβ47Updated last year
- Paper, Code and Statistics for Self-Supervised Learning and Pre-Training on Speech.β208Updated last year
- β172Updated last year
- A collection of datasets for the purpose of emotion recognition/detection in speech.β371Updated 11 months ago
- FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3β215Updated last year
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representationsβ172Updated last year
- Scripts for computing the Intelligibility and CLVP scores for evaluating TTS modelsβ162Updated last year
- a curated list of speech datasets (110+ datasets, 75+ easy to download)β148Updated 2 years ago
- VoiceBench: Benchmarking LLM-Based Voice Assistantsβ272Updated last week
- [INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.β52Updated last year
- Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learningβ94Updated 9 months ago
- PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Tβ¦β194Updated 2 years ago
- VoiceLDM: Text-to-Speech with Environmental Contextβ182Updated last year
- Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three differenβ¦β241Updated 3 years ago
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"β43Updated 4 months ago
- [INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for β¦β164Updated 3 months ago
- Official repository of NeXt-TDNN for speaker verificationβ78Updated 10 months ago
- Research code for the paper "Fine-tuning wav2vec2 for speaker recognition" found at https://arxiv.org/abs/2109.15053β145Updated 3 years ago
- Official Code implementation for the ICLR paper "LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading"β75Updated 11 months ago
- Training code for FAcodec presented in NaturalSpeech3β216Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.β88Updated last year