NVIDIA / NeMo-speech-data-processor
A toolkit for processing speech data and creating speech datasets
☆103Updated last week
Alternatives and similar repositories for NeMo-speech-data-processor:
Users that are interested in NeMo-speech-data-processor are comparing it to the libraries listed below
- ☆63Updated last month
- A TTS model that makes a speaker speak new languages☆75Updated 7 months ago
- NeMo text processing for ASR and TTS☆297Updated last week
- Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva☆83Updated last month
- ☆84Updated 9 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆90Updated 3 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆46Updated 6 months ago
- Tunable pipelines☆31Updated this week
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆77Updated last year
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆183Updated 4 months ago
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…☆71Updated last week
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆244Updated 8 months ago
- VoiceBox neural network implementation☆100Updated 5 months ago
- ONNX and TensorRT implementation of Whisper☆61Updated last year
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆139Updated last year
- An unofficial PyTorch implementation of VALL-E☆87Updated this week
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆102Updated last month
- Implementation of Google's USM speech model in Pytorch☆27Updated 2 months ago
- ☆56Updated 2 years ago
- ☆60Updated last year
- AudioBench: A Universal Benchmark for Audio Large Language Models☆109Updated this week
- PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.☆158Updated 10 months ago
- Various speech datasets made available to the public☆107Updated last month
- JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech☆105Updated 2 years ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆140Updated last year
- UTokyo-SaruLab MOS Prediction System☆127Updated last month
- Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E☆136Updated 2 months ago
- Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models☆147Updated last year
- ☆35Updated 3 months ago