common-voice/sentence-collector

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/common-voice/sentence-collector)

common-voice / sentence-collector

Tool to collect and review sentences for Common Voice

☆83

Alternatives and similar repositories for sentence-collector

Users that are interested in sentence-collector are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

common-voice / cv-sentence-extractor
View on GitHub
Scraping Wikipedia for fair use sentences
☆54Jan 25, 2024Updated 2 years ago
common-voice / common-voice-bundler
View on GitHub
Script for bundling Common Voice (https://commonvoice.mozilla.org/) clips by language
☆11Apr 13, 2023Updated 3 years ago
common-voice / CorporaCreator
View on GitHub
Command line tool to create corpora for Common Voice
☆78Mar 25, 2026Updated 4 months ago
common-voice / cv-dataset
View on GitHub
Metadata and versioning details for the Common Voice dataset
☆173Jun 16, 2026Updated last month
common-voice / common-voice
View on GitHub
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
☆3,475Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
JRMeyer / common-voice-stats
View on GitHub
A living document for all things Common Voice.
☆14Jun 24, 2024Updated 2 years ago
wisesight / newmm-tokenizer
View on GitHub
Standalone Dictionary-based, Maximum Matching + Thai Character Cluster (newmm) tokenizer extracted from PyThaiNLP
☆13Jan 6, 2022Updated 4 years ago
common-voice / commonvoice-fr
View on GitHub
Tooling for producing French dataset for Common Voice
☆101Jan 20, 2025Updated last year
cv-project-app / common-voice-app
View on GitHub
Repository of "CV Project" app. It's an unofficial app for Mozilla Common Voice, which permits you to contribute to this project via your…
☆114Jul 8, 2026Updated 3 weeks ago
ftyers / commonvoice-utils
View on GitHub
Linguistic processing for Common Voice
☆59Jan 18, 2024Updated 2 years ago
NoerNova / ShanNLP
View on GitHub
Shan Natural Language Processing tools inspired by PythaiNLP
☆14Mar 1, 2026Updated 4 months ago
common-voice / community-playbook
View on GitHub
Mozilla Voice Community Playbook
☆48May 21, 2024Updated 2 years ago
vistec-AI / model-releases
View on GitHub
☆14Jun 22, 2020Updated 6 years ago
wannaphong / thai-romanization
View on GitHub
Deep learning for thai romanization.
☆14Jul 30, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
pnphannisa / thaimaimee
View on GitHub
Scrape, clean and explore ThaiME dataset
☆12Jul 29, 2020Updated 6 years ago
sidataplus / pdpa
View on GitHub
Thai PDPA Website (Unofficial)
☆12Updated this week
Digital-Umuganda / Deepspeech-Kinyarwanda
View on GitHub
The kinyarwanda model for deepspeech
☆17May 11, 2021Updated 5 years ago
aolney / manual-subtitle-speech-alignment
View on GitHub
Postprocess SRT derived speech alignments for creating clean datasets for machine learning
☆17Jan 4, 2023Updated 3 years ago
nmstoker / SimpleSpeechLoop
View on GitHub
A very basic demonstration connecting speech recognition and text-to-speech
☆20May 3, 2020Updated 6 years ago
UniversalDependencies / UD_Thai-PUD
View on GitHub
Parallel Universal Dependencies.
☆15May 6, 2026Updated 2 months ago
mozilla-iam / dino-park-issues
View on GitHub
DEPRECATED - Archived. Formerly a meta repository for all DinoPark issues
☆18May 14, 2019Updated 7 years ago
proger / haloop
View on GitHub
Agent toolkit for 100 hours of speech and 10 GiB of text
☆14Jul 15, 2025Updated last year
SLSCU / thai-dialect-corpus
View on GitHub
☆43May 4, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
zerospeech / zerospeech2017
View on GitHub
All you need to get started for the Zero Speech Challenge 2017
☆47Apr 23, 2019Updated 7 years ago
Open-Speech-EkStep / crowdsource-dataplatform
View on GitHub
This will hold the crowdsourcing platform to be used to store voice data from various speakers which will act as input dataset for speech…
☆17Mar 6, 2023Updated 3 years ago
spicydog / thai-word-tokenizer
View on GitHub
A web base JavaScript for tokenizing Thai words
☆16Nov 5, 2021Updated 4 years ago
Open-Speech-EkStep / speech-recognition-open-api
View on GitHub
☆13Dec 15, 2022Updated 3 years ago
chrdebru / linked-data-frontend-tutorial
View on GitHub
A step-by-step tutorial for publishing data and an ontology as Linked Data on your machine.
☆14May 9, 2023Updated 3 years ago
ddev / ddev-drupal-solr
View on GitHub
Apache Solr search engine integration for Drupal on DDEV (please consider ddev/ddev-solr first)
☆14Apr 28, 2026Updated 3 months ago
So-Cool / you-only-write-thrice
View on GitHub
A companion repository to the "You Only Write Thrice: Creating Documents, Computational Notebooks and Presentations From a Single Source"…
☆20Oct 14, 2022Updated 3 years ago
PyThaiNLP / nlpo3
View on GitHub
Thai natural language processing library in Rust, with Python and Node bindings.
☆47Apr 12, 2026Updated 3 months ago
asafu-art / deepspeech-kabyle
View on GitHub
Automatic Speech Recognition (ASR) - Kabyle
☆18Nov 28, 2020Updated 5 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
mondyfy / fileuploader
View on GitHub
File uploader service to Online file storage services.
☆10Dec 10, 2022Updated 3 years ago
LibreOffice / translations-weblate
View on GitHub
intermediate repository used by weblate - translations repository has the files used by LibreOffice
☆15Updated this week
Mayvenn / kafka-component
View on GitHub
A component to consume with many threads from Kafka
☆12Jul 6, 2023Updated 3 years ago
davidar / ipfs-maps
View on GitHub
OSM vector tiles on IPFS
☆29Jan 18, 2017Updated 9 years ago
KoichiYasuoka / spaCy-Thai
View on GitHub
Dependency parser on Thai language
☆27Jan 25, 2025Updated last year
HarikalarKutusu / 3d-voice-chess
View on GitHub
A voice driven 3D chess game for learning Voice AI
☆17Jul 6, 2022Updated 4 years ago
mozilla-extensions / firefox-voice
View on GitHub
Firefox Voice is an experiment in a voice-controlled web user agent
☆292Jan 29, 2021Updated 5 years ago