razvan404 / multimodal-speech-emotion-recognition

Multimodal SER Model meant to be trained on recognising emotions from speech (text + acoustic data). Fine-tuned the DeBERTaV3 model, respectively the Wav2Vec2 model to extract the features and classify the emotions from the text, respectively audio data, then passed their features and their classification through an MLP to achieve better results…
10Updated 9 months ago

Alternatives and similar repositories for multimodal-speech-emotion-recognition:

Users that are interested in multimodal-speech-emotion-recognition are comparing it to the libraries listed below