NVIDIA/NeMo-speech-data-processor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/NeMo-speech-data-processor)

NVIDIA / NeMo-speech-data-processor

A toolkit for processing speech data and creating speech datasets

☆212

Alternatives and similar repositories for NeMo-speech-data-processor

Users that are interested in NeMo-speech-data-processor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / NeMo-text-processing
View on GitHub
NeMo text processing for ASR and TTS
☆484Updated this week
k2-fsa / kaldi-decoder
View on GitHub
Decoders from Kaldi using OpenFst
☆35Apr 10, 2026Updated 3 months ago
yukara-ikemiya / Open-Miipher-2
View on GitHub
PyTorch implementation of Miipher-2 [2025] which is a speech restoration model by Google DeepMind
☆70Sep 22, 2025Updated 9 months ago
ag1988 / mel-asr
View on GitHub
The accompanying code for "Exploring the limits of decoder-only models trained on public speech recognition corpora" (Ankit Gupta, George…
☆21Oct 11, 2024Updated last year
p1an-lin-jung / wv_tts
View on GitHub
☆19Mar 22, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lifeiteng / Aligner-SUPERB
View on GitHub
Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark
☆39May 7, 2025Updated last year
yangdongchao / SimpleSpeech
View on GitHub
The open source code for SimpleSpeech series
☆147Oct 8, 2024Updated last year
ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
ictnlp / SLED-TTS
View on GitHub
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
☆108May 20, 2025Updated last year
leto19 / WhiSQA
View on GitHub
Whisper Speech Quality Assessment (WhiSQA)
☆16Apr 14, 2026Updated 3 months ago
google-research / last
View on GitHub
A JAX library for building lattice-based speech transducer models
☆48Jul 2, 2026Updated 2 weeks ago
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆124Jun 4, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
lhotse-speech / lhotse
View on GitHub
Tools for handling multimodal data in machine learning projects.
☆1,143Jun 22, 2026Updated 3 weeks ago
google-deepmind / librispeech-long
View on GitHub
LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …
☆98Dec 28, 2024Updated last year
asappresearch / simple-tts
View on GitHub
Contains the code associated with the ICLR submission for our text-to-speech diffusion model
☆57Oct 31, 2023Updated 2 years ago
X-LANCE / UniCATS-CTX-vec2wav
View on GitHub
[AAAI 2024] Code for CTX-vec2wav in UniCATS
☆130Jun 11, 2024Updated 2 years ago
wavlab-speech / versa
View on GitHub
Versatile Evaluation of Speech and Audio
☆423Updated this week
SpeechColab / GigaSpeech2
View on GitHub
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
☆197Apr 28, 2026Updated 2 months ago
PlayVoice / BigVGAN
View on GitHub
BigVGAN with Neural Source-Filter
☆58Sep 21, 2023Updated 2 years ago
ICASSP2021-tutorial9 / Distant_conversational_ASR_and_analysis
View on GitHub
☆12Jun 10, 2021Updated 5 years ago
ddlBoJack / MT4SSL
View on GitHub
[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…
☆45Mar 25, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
innnky / descript-audio-vae
View on GitHub
VAE modified from Descript Audio Codec, which replaces the RVQ with VAE
☆92Apr 2, 2024Updated 2 years ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
ZhikangNiu / A-DMA
View on GitHub
[INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"
☆67Jun 16, 2025Updated last year
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆54May 1, 2025Updated last year
light1726 / BetaVAE_VC
View on GitHub
Implementation for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE"
☆43Apr 10, 2023Updated 3 years ago
inclusionAI / MingTok-Audio
View on GitHub
☆88Feb 24, 2026Updated 4 months ago
ArenAcikgoz / Whisper-Alignment
View on GitHub
Forced alignment decoder for Whisper.
☆16Mar 13, 2024Updated 2 years ago
lumaku / ctc-segmentation
View on GitHub
Segment an audio file and obtain utterance alignments. (Python package)
☆348May 15, 2024Updated 2 years ago
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Takaaki-Saeki / DiscreteSpeechMetrics
View on GitHub
Reference-aware automatic speech evaluation toolkit
☆185Dec 5, 2024Updated last year
KakaruHayate / CODEY_Dataset
View on GitHub
一个第三方的泠鸢yousa歌声数据集
☆19Jun 23, 2026Updated 3 weeks ago
sp-nitech / diffsptk
View on GitHub
A differentiable version of SPTK
☆201Jul 14, 2026Updated last week
k2-fsa / text_search
View on GitHub
Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup
☆79Jun 30, 2025Updated last year
facebookresearch / spidr
View on GitHub
This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…
☆57Updated this week
joonaskalda / PixIT
View on GitHub
Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…
☆105Jan 10, 2025Updated last year
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆197Feb 28, 2026Updated 4 months ago