common-voice / cv-sentence-extractor
Scraping Wikipedia for fair use sentences
β52Updated 11 months ago
Alternatives and similar repositories for cv-sentence-extractor:
Users that are interested in cv-sentence-extractor are comparing it to the libraries listed below
- Tool to collect and review sentences for Common Voiceβ81Updated last year
- π» MediaWiki extension allowing mass recording of clean, well cut, well named pronunciation files.β16Updated last week
- Command line tool to create corpora for Common Voiceβ75Updated 7 months ago
- A crash course for training speech recognition models using DeepSpeech.β24Updated 3 years ago
- Massively multilingual pronunciation miningβ327Updated last month
- Open Source AI Benchmarking toolkit for benchmarking speech to text servicesβ55Updated 9 months ago
- The CMU Pronouncing Dictionary converted to IPAβ78Updated 5 years ago
- Efficient teacher-student models and scripts to make themβ49Updated last year
- Linguistic processing for Common Voiceβ52Updated last year
- Convert Arpabet to IPA. Arpabet is the set of phonemes used by the CMU Pronouncing Dictionary. IPA is the International Phonetic Alphabetβ¦β43Updated 4 years ago
- πΈTTS recipes for different datasetsβ85Updated 2 years ago
- A tool for automatic phoneme transcriptionβ157Updated last year
- The Unicode Cookbook for Linguistsβ53Updated 4 years ago
- Tool for creation, manipulation and maintenance of voice corporaβ81Updated 8 months ago
- Metadata and versioning details for the Common Voice datasetβ145Updated last month
- Python wrapper for phonetisaurus grapheme to phoneme toolβ12Updated 3 years ago
- Python library for handling audio datasets.β136Updated last year
- Wiktionary parser tool for many language editions.β53Updated 2 years ago
- Labeled data for homograph disambiguationβ54Updated last year
- 24-hour Automatic Speech Recognitionβ27Updated 3 years ago
- Script for bundling Common Voice (https://commonvoice.mozilla.org/) clips by languageβ10Updated last year
- Automatically exported from code.google.com/p/m2m-alignerβ42Updated 8 years ago
- π software for creating speech recognition models.β154Updated 7 months ago
- β42Updated 7 years ago
- Unicode Standard tokenization routines and orthography profile segmentationβ34Updated 2 years ago
- Gecko - A Tool for Effective Annotation of Human Conversationsβ279Updated last year
- β22Updated 2 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.β34Updated last year
- π« check your data, before you wreck your modelβ16Updated 2 years ago
- The Global WordNet Association Collaborative Inter-Lingual Indexβ41Updated 2 months ago