common-voice / cv-sentence-extractor
Scraping Wikipedia for fair use sentences
β52Updated last year
Alternatives and similar repositories for cv-sentence-extractor:
Users that are interested in cv-sentence-extractor are comparing it to the libraries listed below
- Tool to collect and review sentences for Common Voiceβ81Updated last year
- Command line tool to create corpora for Common Voiceβ75Updated 8 months ago
- πΈTTS recipes for different datasetsβ85Updated 2 years ago
- A crash course for training speech recognition models using DeepSpeech.β24Updated 3 years ago
- Script for bundling Common Voice (https://commonvoice.mozilla.org/) clips by languageβ10Updated last year
- Efficient teacher-student models and scripts to make themβ49Updated last year
- Crawler for linguistic corporaβ199Updated last year
- Massively multilingual pronunciation miningβ331Updated 2 months ago
- Metadata and versioning details for the Common Voice datasetβ145Updated last month
- Linguistic processing for Common Voiceβ53Updated last year
- Facebook AI Research Automatic Speech Recognition Toolkitβ23Updated 3 years ago
- β71Updated last week
- β22Updated 2 years ago
- π software for creating speech recognition models.β158Updated 8 months ago
- π» MediaWiki extension allowing mass recording of clean, well cut, well named pronunciation files.β16Updated 2 weeks ago
- A database of number names for 186 languages, locales, and scriptsβ66Updated last year
- A guide to building language technology in new languages.β58Updated 3 years ago
- Automatically exported from code.google.com/p/m2m-alignerβ42Updated 8 years ago
- Convert Arpabet to IPA. Arpabet is the set of phonemes used by the CMU Pronouncing Dictionary. IPA is the International Phonetic Alphabetβ¦β43Updated 4 years ago
- The CMU Pronouncing Dictionary converted to IPAβ80Updated 5 years ago
- Unicode Standard tokenization routines and orthography profile segmentationβ34Updated 2 years ago
- Listening-based language learningβ53Updated last year
- Spoken Language Identification on Common Voice and AudioSet using Deep Learningβ37Updated 2 years ago
- Bitextor generates translation memories from multilingual websitesβ293Updated 3 months ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.β154Updated 7 months ago
- Tool for creation, manipulation and maintenance of voice corporaβ81Updated 9 months ago
- Data and code for grapheme-to-phoneme transducers in lots of languagesβ131Updated 10 months ago
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).β61Updated last month
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.β49Updated last month
- Universal Romanizer that can convert any unicode script to roman (latin) scriptβ174Updated 6 months ago