rkmt / wesper-demo
☆28Updated last year
Alternatives and similar repositories for wesper-demo:
Users that are interested in wesper-demo are comparing it to the libraries listed below
- PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.☆52Updated this week
- ☆63Updated 3 weeks ago
- SelfRemaster: SSL Speech Restoration☆88Updated last year
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…☆87Updated 4 months ago
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆15Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆83Updated last year
- This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamic…☆49Updated 7 months ago
- Voice Activity Projection Models: Self-supervised learning of Turn-taking Events☆64Updated 11 months ago
- Unofficial implementation of miipher☆121Updated last year
- Repository for "LLM-based speaker diarization correction: A generalizable approach" paper☆12Updated 9 months ago
- Real-time binaural target sound extraction model.☆84Updated last year
- Open implementation of UNIVERSE and UNIVERSE++ diffusion-based speech enhancement models.☆94Updated 8 months ago
- Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E☆138Updated 6 months ago
- ☆57Updated 10 months ago
- DDPM-based Pitch Generation and Pitch Controllable Voice Synthesis.☆53Updated last year
- VITS-based zero-shot TTS system varying with diverse style/speaker conditioning methods.☆36Updated 2 years ago
- ☆29Updated 3 years ago
- S3PRL-VC: A Voice Conversion Toolkit based on S3PRL☆99Updated 10 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆96Updated 9 months ago
- Audio-visual diarization pipeline used for creating VoxConverse dataset☆21Updated 2 months ago
- A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.☆22Updated last month
- Simple and lightweight Zero-shot Text-to-Speech (TTS) synthesis model☆23Updated last week
- ☆32Updated last year
- ☆50Updated last month
- Survey on speech generation work.☆18Updated last year
- ☆33Updated 3 months ago
- ☆38Updated 7 months ago
- ☆61Updated last year
- Clustering-based methods for overlapping diarization☆81Updated last year
- ☆19Updated last year