abikaki / awesome-speech-emotion-recognitionLinks

😎 Awesome lists about Speech Emotion Recognition

☆100

Alternatives and similar repositories for awesome-speech-emotion-recognition

Users that are interested in awesome-speech-emotion-recognition are comparing it to the libraries listed below

Sorting:

roudimit / whisper-flamingo
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
☆192Updated 3 months ago
BenoitWang / Speech_Emotion_Diarization
☆69Updated last year
emo-box / EmoBox
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
☆294Updated 7 months ago
choijeongsoo / lip2speech-unit
[Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units
☆47Updated last year
KunZhou9646 / Mixed_Emotions
☆121Updated 3 years ago
lifeiteng / naturalspeech3_facodec
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
☆222Updated last year
ASR-project / Multilingual-PR
Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three differen…
☆253Updated 3 years ago
Choddeok / EmoSpherepp
[TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vec…
☆111Updated 2 months ago
glory20h / VoiceLDM
VoiceLDM: Text-to-Speech with Environmental Context
☆188Updated last year
skit-ai / SpeechLLM
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingfac…
☆125Updated last year
X-LANCE / MSDWILD
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆58Updated last year
Choddeok / EmoSphere-TTS
[INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …
☆166Updated 6 months ago
nii-yamagishilab / ZMM-TTS
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
☆180Updated last year
imxtx / awesome-controllable-speech-synthesis
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey".
☆192Updated this week
Plachtaa / FAcodec
Training code for FAcodec presented in NaturalSpeech3
☆228Updated last year
FrenchKrab / IS2023-powerset-diarization
Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.
☆91Updated 2 years ago
keonlee9420 / Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised T…
☆194Updated 3 years ago
joonaskalda / PixIT
Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…
☆98Updated 10 months ago
ddlBoJack / Awesome-Speech-Pretraining
Paper, Code and Statistics for Self-Supervised Learning and Pre-Training on Speech.
☆210Updated last year
Zain-Jiang / Speech-Editing-Toolkit
It's a repository for implementations of neural speech editing algorithms.
☆200Updated last year
thuhcsi / SECap
☆173Updated last year
KAIST-AILab / SyncVSR
[Interspeech 2024] SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
☆59Updated 8 months ago
HappyColor / Vesper
A Compact and Effective Pretrained Model for Speech Emotion Recognition
☆49Updated last year
hayeong0 / Diff-HierVC
Official Pytorch Implementation of "Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Pr…
☆232Updated last year
YuanGongND / vocalsound
Dataset and baseline code for the VocalSound dataset (ICASSP2022).
☆154Updated 3 years ago
JeffC0628 / awesome-voice-conversion
A curated list of awesome voice conversion, projects and communities.
☆252Updated last week
MontrealCorpusTools / mfa-models
Collection of pretrained models for the Montreal Forced Aligner
☆177Updated last month
SuperKogito / SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
☆383Updated last year
yl4579 / PL-BERT
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
☆263Updated 10 months ago
Audio-WestlakeU / FS-EEND
The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based …
☆157Updated last week