rkmt / wesper-demo
β27Updated last year
Alternatives and similar repositories for wesper-demo:
Users that are interested in wesper-demo are comparing it to the libraries listed below
- πΌ Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decompositionβ16Updated last year
- Voice Activity Projection Models: Self-supervised learning of Turn-taking Eventsβ58Updated 10 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β95Updated 5 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β61Updated 3 weeks ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.β81Updated last year
- VoiceLDM: Text-to-Speech with Environmental Contextβ172Updated 7 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translationβ147Updated last month
- β55Updated last week
- Real-time binaural target sound extraction model.β83Updated last year
- PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.β50Updated 5 months ago
- Google's SoundStorm: Efficient Parallel Audio Generationβ131Updated last year
- β15Updated 2 years ago
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordingsβ¦β82Updated 2 months ago
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.β124Updated 2 weeks ago
- Unofficial implementation of miipherβ120Updated 11 months ago
- PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.β166Updated last year
- SelfRemaster: SSL Speech Restorationβ88Updated last year
- β48Updated last week
- On-device speaker diarization powered by deep learningβ41Updated 3 weeks ago
- Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detectionβ62Updated last week
- Unsupervised Rhythm Modeling for Voice Conversionβ80Updated last year
- This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamicβ¦β47Updated 6 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,β¦β68Updated 6 months ago
- A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.β21Updated 3 weeks ago
- Your one-stop solution for voice dataset creationβ118Updated last year
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'β95Updated 8 months ago
- An unofficial implementation of the Personal VAD speaker-conditioned voice activity detection method. Bachelor's thesis project.β65Updated 2 years ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.β47Updated 3 weeks ago
- A python library for voice activity detection (VAD) for speech/non-speech segmentation.β87Updated 2 years ago
- Speaker change detection using SincNet and an LSTM/Transformerβ48Updated 9 months ago