common-voice / cv-sentence-extractorLinks
Scraping Wikipedia for fair use sentences
☆54Updated last year
Alternatives and similar repositories for cv-sentence-extractor
Users that are interested in cv-sentence-extractor are comparing it to the libraries listed below
Sorting:
- Tool to collect and review sentences for Common Voice☆81Updated 2 years ago
- Command line tool to create corpora for Common Voice☆77Updated last year
- 🐸TTS recipes for different datasets☆86Updated 2 years ago
- A crash course for training speech recognition models using DeepSpeech.☆25Updated 4 years ago
- Convert Arpabet to IPA. Arpabet is the set of phonemes used by the CMU Pronouncing Dictionary. IPA is the International Phonetic Alphabet…☆44Updated 4 years ago
- Metadata and versioning details for the Common Voice dataset☆150Updated 3 weeks ago
- 🙊 software for creating speech recognition models.☆159Updated last year
- Massively multilingual pronunciation mining☆344Updated last month
- British English pronunciation dictionary☆95Updated 7 years ago
- Universal Romanizer that can convert any unicode script to roman (latin) script☆214Updated 11 months ago
- A tool for automatic phoneme transcription☆157Updated 2 years ago
- Linguistic processing for Common Voice☆55Updated last year
- Mozilla Voice Community Playbook☆47Updated last year
- A tokenizer, text cleaner, and phonemizer for many human languages.☆320Updated 8 months ago
- ipapy is a Python module to work with International Phonetic Alphabet (IPA) strings☆87Updated last year
- A guide to building language technology in new languages.☆58Updated 3 years ago
- 🐸STT integration examples☆129Updated 2 years ago
- Tool for creation, manipulation and maintenance of voice corpora☆81Updated last year
- Crawler for linguistic corpora☆204Updated last year
- Datasets and tools for basic natural language processing.☆384Updated 3 years ago
- Gecko - A Tool for Effective Annotation of Human Conversations☆291Updated 2 years ago
- A database of number names for 186 languages, locales, and scripts☆67Updated 2 years ago
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆63Updated last week
- Open Source AI Benchmarking toolkit for benchmarking speech to text services☆56Updated last year
- The Unicode Cookbook for Linguists☆54Updated 4 years ago
- A code for transliterating (romanizing) Arabic text using the American Library Association - Library of Congress (ALA-LC) standard☆47Updated 3 years ago
- Indian Language Tagger and Chunker (Hindi, Telugu, Tamil, Marathi, Punjabi, Kanada, Malayalam, Urdu, Bengali)☆41Updated 2 years ago
- Python library for handling audio datasets.☆137Updated 2 years ago
- Model for recasing and repunctuating ASR transcripts☆135Updated last year
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆47Updated 2 years ago