RanaCM / DSU-AVOView external linksLinks
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
☆12May 13, 2024Updated last year
Alternatives and similar repositories for DSU-AVO
Users that are interested in DSU-AVO are comparing it to the libraries listed below
Sorting:
- A pitch detection model trained to be robust against noise and reverberation environments.☆27Jan 21, 2025Updated last year
- ☆25Mar 12, 2022Updated 3 years ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 5 months ago
- Code for "Distribution-based Emotion Recognition in Conversation"☆19Feb 6, 2023Updated 3 years ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- Glow-TTS with Stochastic Duration Predictor and Stochastic Pitch Predictor☆18Jun 5, 2023Updated 2 years ago
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆30May 27, 2023Updated 2 years ago
- An AR+AR TTS attempt.☆18Jan 13, 2025Updated last year
- ☆59May 17, 2023Updated 2 years ago
- ☆19Mar 22, 2024Updated last year
- Pytorch implementation for “V2C: Visual Voice Cloning”☆33Jan 28, 2023Updated 3 years ago
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆21May 26, 2025Updated 8 months ago
- Please visit: https://thuhcsi.github.io/icassp2021-emotion-tts/☆34Mar 17, 2023Updated 2 years ago
- [CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.☆111Jun 21, 2024Updated last year
- PyTorch implementation of "Lip to Speech Synthesis in the Wild with Multi-task Learning" (ICASSP2023)☆70Mar 9, 2024Updated last year
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Dec 12, 2024Updated last year
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…☆61Apr 4, 2024Updated last year
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆25Aug 11, 2024Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- Code for ACL 2023 main conference paper "Back Translation for Speech-to-text Translation Without Transcripts".☆12Oct 25, 2023Updated 2 years ago
- ☆13Sep 25, 2024Updated last year
- ☆13Oct 25, 2024Updated last year
- text to speech☆10Mar 19, 2024Updated last year
- A neural speech codec based on discrete WavLM representations☆24Aug 28, 2024Updated last year
- ☆13Oct 11, 2024Updated last year
- Pybind11 bindings for Kaldi☆15Feb 1, 2026Updated 2 weeks ago
- A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts☆16Dec 3, 2024Updated last year
- Project of Singing Voice Conversion.☆16Oct 27, 2023Updated 2 years ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- ☆14Aug 16, 2023Updated 2 years ago
- ☆82Jan 22, 2025Updated last year
- TTS Text Analyzer☆32Jul 20, 2023Updated 2 years ago
- Anim-400K: A dataset designed from the ground up for automated dubbing of video☆111Jun 21, 2024Updated last year
- Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023☆86Oct 10, 2023Updated 2 years ago
- Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023☆27Apr 27, 2023Updated 2 years ago
- Unofficial implementation of NANSY++ in Pytorch Lightning☆50Mar 11, 2024Updated last year
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…☆46Jul 2, 2024Updated last year
- Official implementation for the paper: A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Unit…☆83Jan 7, 2023Updated 3 years ago
- DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors☆35Feb 11, 2025Updated last year