[ACM MM 2023] Official PyTorch implementation of "Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition".
☆12Aug 4, 2023Updated 2 years ago
Alternatives and similar repositories for Emo-DNA
Users that are interested in Emo-DNA are comparing it to the libraries listed below
Sorting:
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆44Mar 3, 2025Updated last year
- text to speech☆10Mar 19, 2024Updated last year
- ☆10Apr 17, 2024Updated last year
- [ICASSP 2023] Official Tensorflow implementation of "Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech E…☆187May 15, 2024Updated last year
- [INTERSPEECH 2025] The official implementation of DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for…☆16Sep 7, 2025Updated 6 months ago
- DUSTED: Spoken-Term Discovery using Discrete Speech Units☆18Oct 2, 2024Updated last year
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆16Jun 23, 2024Updated last year
- 🏠🔍 Auto check for new apartments in Hamburg from various real estate provides☆16Jun 2, 2024Updated last year
- Aligntune : A Modular Toolkit for Post Training Alignment of LLMs☆35Feb 26, 2026Updated last week
- This repo is text to speech with learnable audio encoder without alignment with transcript reference☆54Sep 20, 2025Updated 5 months ago
- End-To-End SpeechSynthesis system with knowledge distillation☆18Jul 16, 2022Updated 3 years ago
- PyTorch Implementation of Stepwise Monotonic Multihead Attention similar to Enhancing Monotonicity for Robust Autoregressive Transformer …☆39May 16, 2021Updated 4 years ago
- [ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations☆40Dec 18, 2023Updated 2 years ago
- ☆28Jul 17, 2025Updated 7 months ago
- Transferability of cross-lingual and cross-age speech emotion recognition☆21Jun 30, 2023Updated 2 years ago
- An AR+AR TTS attempt.☆18Jan 13, 2025Updated last year
- This repository contains a short introduction on the topic of audio and speech processing -- from basics to applications.☆21Dec 20, 2023Updated 2 years ago
- ☆22Apr 21, 2022Updated 3 years ago
- A pitch detection model trained to be robust against noise and reverberation environments.☆27Jan 21, 2025Updated last year
- [INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …☆172May 20, 2025Updated 9 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆60Oct 23, 2024Updated last year
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)☆59Jun 20, 2024Updated last year
- [ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition☆27Apr 11, 2024Updated last year
- The official implementation of the DIFFA series for dLLM-based large audio language model☆66Updated this week
- ☆27Aug 2, 2023Updated 2 years ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…☆61Apr 4, 2024Updated last year
- DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors☆37Feb 11, 2025Updated last year
- Repository related to Cranfield's AAI MSCs GDP☆11Apr 8, 2023Updated 2 years ago
- A robust pitch tracker using synchro-squeezed fft and frequency domain autocorrelation☆36Jan 17, 2024Updated 2 years ago
- Speech samples and code of BEdit-TTS☆34Oct 8, 2023Updated 2 years ago
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 6 months ago
- 单独维护的中文TTS☆34Oct 28, 2022Updated 3 years ago
- The implementation of paper "SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody"☆34Nov 23, 2023Updated 2 years ago
- ☆10Feb 10, 2026Updated 3 weeks ago
- Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS☆40Aug 4, 2023Updated 2 years ago
- VITS-based zero-shot TTS system varying with diverse style/speaker conditioning methods.☆36Sep 21, 2022Updated 3 years ago
- Pytorch implementation for “V2C: Visual Voice Cloning”☆33Jan 28, 2023Updated 3 years ago
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆62Sep 5, 2025Updated 6 months ago