NVIDIA / NeMo-speech-data-processor
A toolkit for processing speech data and creating speech datasets
☆75Updated last week
Related projects: ⓘ
- A TTS model that makes a speaker speak new languages☆73Updated 3 months ago
- ☆48Updated last month
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆74Updated 2 months ago
- Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models☆135Updated 9 months ago
- Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva☆78Updated last month
- Transcribing Speech with Multinomial Diffusion, training code and models.☆74Updated 11 months ago
- VoiceBox neural network implementation☆88Updated last month
- ☆58Updated 10 months ago
- An unofficial PyTorch implementation of VALL-E☆68Updated this week
- Speaker change detection using SincNet and an LSTM/Transformer☆39Updated 2 months ago
- ☆17Updated last year
- This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingfac…☆43Updated 2 months ago
- Unofficial implementation of miipher☆104Updated 5 months ago
- Monotonic Alignment Search☆83Updated 2 years ago
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆80Updated 11 months ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆127Updated last year
- PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supp…☆45Updated last year
- A JAX library for building lattice-based speech transducer models☆39Updated 5 months ago
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆64Updated last year
- ☆84Updated 5 months ago
- Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning (ASRU2023)☆26Updated 11 months ago
- ☆27Updated 10 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆64Updated 11 months ago
- Finetuning VITS Efficiently☆31Updated 10 months ago
- VALL-E 2 reproduction☆72Updated 2 months ago
- Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.☆34Updated last year
- CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus☆178Updated 2 years ago
- Official implementation of Vec-Tok Speech☆91Updated 11 months ago
- PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.☆149Updated 6 months ago
- NeMo text processing for ASR and TTS☆266Updated this week