zakuro-ai / asrLinks
ASRDeepspeech x Sakura-ML (English/Japanese) with deepspeech2 model in pytorch with support from Zakuro AI.
☆69Updated 2 years ago
Alternatives and similar repositories for asr
Users that are interested in asr are comparing it to the libraries listed below
Sorting:
- context labels and pronunciation data for JSUT corpus☆73Updated 3 years ago
- ESPnet Model Zoo☆255Updated 2 years ago
- ☆225Updated last year
- Python wrapper for OpenJTalk☆227Updated 4 months ago
- Onnx wrapper for espnet infrernce model☆168Updated this week
- ☆87Updated 4 years ago
- One-button-press forced aligner for Japanese, using Julius.☆46Updated 2 years ago
- Repository for the paper: VoiceMe: Personalized voice generation in TTS☆125Updated 3 years ago
- Deep neural network (DNN) for noise reduction, removal of background music, and speech separation☆172Updated 2 years ago
- VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram☆254Updated last year
- real time japanese speech recognition translator using wav2vec2☆39Updated 3 years ago
- Voice based gender recognition using Mel-frequency cepstrum coefficients (MFCC) and Gaussian mixture models (GMM)☆218Updated 2 years ago
- ☆32Updated 2 years ago
- JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech☆111Updated 3 years ago
- Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow☆129Updated 4 years ago
- PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech☆229Updated 3 years ago
- [WIP] Scripts for fine-tuning Whisper☆221Updated 2 years ago
- xvector model on jtubespeech☆45Updated last year
- Neural HMMs are all you need (for high-quality attention-free TTS)☆159Updated this week
- ☆27Updated 4 years ago
- 44100Hz日本語音源に対応した PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor です。☆20Updated 2 years ago
- VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network☆319Updated last year
- ☆199Updated 3 years ago
- Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO☆64Updated 2 years ago
- A public domain single speaker Japanese speech dataset☆54Updated last year
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io☆69Updated last year
- Application of MB-iSTFT-VITS components to vits2_pytorch☆128Updated 8 months ago
- PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean,…☆304Updated 3 years ago
- Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three differen…☆241Updated 3 years ago
- Library to build speech synthesis systems designed for easy and fast prototyping.☆398Updated last year