common-voice/CorporaCreator

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/common-voice/CorporaCreator)

common-voice / CorporaCreator

Command line tool to create corpora for Common Voice

☆78

Alternatives and similar repositories for CorporaCreator

Users that are interested in CorporaCreator are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

common-voice / common-voice-bundler
View on GitHub
Script for bundling Common Voice (https://commonvoice.mozilla.org/) clips by language
☆11Apr 13, 2023Updated 3 years ago
ftyers / commonvoice-utils
View on GitHub
Linguistic processing for Common Voice
☆59Jan 18, 2024Updated 2 years ago
TalnUPF / praat_web
View on GitHub
☆13Jun 30, 2026Updated 3 weeks ago
common-voice / sentence-collector
View on GitHub
Tool to collect and review sentences for Common Voice
☆83May 10, 2023Updated 3 years ago
talhanai / kaldi-diar-latte
View on GitHub
steps to perform text-based speaker diarization with kaldi toolkit
☆12Nov 2, 2018Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tiro-is / tiro-speech-core
View on GitHub
This is a mirror of https://gitlab.com/tiro-is/tiro-speech-core
☆15Jun 19, 2023Updated 3 years ago
i3thuan5 / FaNT
View on GitHub
Filtering and Noise Adding Tool
☆29May 27, 2022Updated 4 years ago
MycroftAI / pylisten
View on GitHub
A simple pyaudio microphone interface
☆11Jul 27, 2018Updated 8 years ago
mozilla / DSAlign
View on GitHub
DeepSpeech based forced alignment tool
☆239Dec 12, 2020Updated 5 years ago
JRMeyer / common-voice-stats
View on GitHub
A living document for all things Common Voice.
☆14Jun 24, 2024Updated 2 years ago
AI-Lab-Makerere / Data4Good
View on GitHub
This repository contains publicly available speech and text data in Luganda.
☆12Sep 4, 2020Updated 5 years ago
common-voice / cv-sentence-extractor
View on GitHub
Scraping Wikipedia for fair use sentences
☆54Jan 25, 2024Updated 2 years ago
aalto-speech / subword-kaldi
View on GitHub
Properly handle position-dependent phones in a subword lexicon FST
☆31Oct 26, 2020Updated 5 years ago
WangHelin1997 / Aty-TTS
View on GitHub
Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech
☆11May 14, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Idlak / Living-Audio-Dataset
View on GitHub
A "Crowd-Built" continuously growing speech dataset with transcripts. The dataset contains multiple languages and is intended for anyone …
☆43Aug 3, 2022Updated 3 years ago
KathyReid / opensource-voice-tools
View on GitHub
A repo listing known open source voice tools, ordered by where they sit in the voice stack
☆28Sep 23, 2022Updated 3 years ago
domcross / german-stt-evaluation
View on GitHub
Evaluation of STT models for german language
☆16Jan 22, 2022Updated 4 years ago
smaybius / Coqui-TTS-GUI-solution
View on GitHub
Interface for using TTS and vocoder models in the form of a text editor
☆20Nov 25, 2025Updated 8 months ago
NTRLab / MediaSpeech
View on GitHub
☆22Jul 22, 2022Updated 4 years ago
mozilla / deepspeech-playbook
View on GitHub
DEPRECATED - A crash course for training speech recognition models using DeepSpeech.
☆24May 16, 2021Updated 5 years ago
enebo / rpiet
View on GitHub
Piet language in Ruby
☆13Nov 12, 2024Updated last year
simoninithomas / jammo_the_robot
View on GitHub
☆22Sep 16, 2021Updated 4 years ago
idnavid / speech_activity_detection
View on GitHub
Unsupervised speech activity detection system.
☆11Jul 2, 2018Updated 8 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
coqui-ai / open-speech-corpora
View on GitHub
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
☆1,398Jun 6, 2024Updated 2 years ago
cadia-lvl / punctuation-prediction
View on GitHub
Support tools for punctuation and boundary detection for ASR output.
☆55Dec 8, 2022Updated 3 years ago
naver / multilingual-distilwhisper
View on GitHub
This repository contains all the code necessary for running the multilingual distilwhisper from Ferraz et al. 2024 IEEE ICASSP paper.
☆34Apr 22, 2026Updated 3 months ago
motazsaad / ara-pronunciation-tool
View on GitHub
A python tool that converts Arabic diacritised text to a sequence of phonemes and creates a pronunciation dictionary. This code is based …
☆15Sep 5, 2017Updated 8 years ago
yuhaozhang / nnjm-global
View on GitHub
A python implementation of the neural network joint language model and an extension of it using global source context.
☆11May 17, 2017Updated 9 years ago
erikernst4 / entrainment-metrics
View on GitHub
Acoustic-prosodic entrainment measurement in spoken dialogue and approximation of the evolution of a speaker’s a/p features.
☆14Feb 26, 2024Updated 2 years ago
coqui-ai / inference-engine
View on GitHub
Coqui Inference Engine
☆41Aug 3, 2021Updated 4 years ago
ua-datalab / NLP-Speech
View on GitHub
The repository for U of A Datalab’s “NLP for All” workshop series, where we cover the basics of Natural Language Processing (NLP) and its…
☆11Aug 8, 2025Updated 11 months ago
chmodsss / noizeus_corpora
View on GitHub
Speech corpora for the speech recognition evaluation system
☆21Mar 20, 2018Updated 8 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
common-voice / common-voice
View on GitHub
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
☆3,475Updated this week
mayukhnair / deepspeech-colab
View on GitHub
Running Mozilla's implementation of Baidu DeepSpeech on Google Colaboratory
☆16Mar 18, 2019Updated 7 years ago
sarulab-speech / jtubespeech
View on GitHub
☆233Nov 13, 2023Updated 2 years ago
nmstoker / SimpleSpeechLoop
View on GitHub
A very basic demonstration connecting speech recognition and text-to-speech
☆20May 3, 2020Updated 6 years ago
asappresearch / multistream-cnn
View on GitHub
Multistream CNN for Robust Acoustic Modeling
☆40Jun 17, 2021Updated 5 years ago
gullabi / STT-align
View on GitHub
Coqui STT (🐸STT) based forced alignment tool
☆13Feb 24, 2022Updated 4 years ago
wearespindle / django-ranged-fileresponse
View on GitHub
This is a modified FileResponse that returns `Content-Range` headers with the HTTP response, so browsers (read Safari 9+) that request th…
☆20Nov 10, 2022Updated 3 years ago