rkmt / wesper-demo
β24Updated 10 months ago
Related projects β
Alternatives and complementary repositories for wesper-demo
- Voice Activity Projection Models: Self-supervised learning of Turn-taking Eventsβ39Updated 5 months ago
- πΌ Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decompositionβ15Updated 8 months ago
- PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.β47Updated last month
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.β42Updated 2 months ago
- Collection of scripts from mHuBERT-147.β22Updated this week
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordingsβ¦β46Updated 5 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translationβ80Updated this week
- β18Updated last year
- β81Updated 2 months ago
- Real-time binaural target sound extraction model.β74Updated 7 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'β88Updated 3 months ago
- Google's SoundStorm: Efficient Parallel Audio Generationβ129Updated last year
- This is a repository of neural full-rank spatial covariance analysis with speaker activity (neural FCASA).β24Updated 5 months ago
- β17Updated 3 months ago
- A sequence-to-sequence voice conversion toolkit.β86Updated 4 months ago
- S3PRL-VC: A Voice Conversion Toolkit based on S3PRLβ97Updated 4 months ago
- Audiogen Codecβ127Updated 4 months ago
- Transcribing Speech with Multinomial Diffusion, training code and models.β75Updated last year
- Zero-Shot Emotion Style Transferβ37Updated 7 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion modelβ50Updated last year
- 56 language, 1 model Multilingual ASRβ24Updated 3 years ago
- Deep Articulatory Synthesis and Inversionβ43Updated 9 months ago
- β61Updated 3 months ago
- Official code of ElasticAST (Interspeech 2024 paper)β23Updated 3 months ago
- SelfRemaster: SSL Speech Restorationβ87Updated 10 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β83Updated last month
- β69Updated last year
- β27Updated 7 months ago
- Datasets for turn-taking researchβ12Updated 11 months ago
- ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillationβ32Updated this week