abuccts / wikt2pron
A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format
☆33Updated 5 years ago
Alternatives and similar repositories for wikt2pron:
Users that are interested in wikt2pron are comparing it to the libraries listed below
- Python Finite-State Toolkit☆54Updated 2 months ago
- Python module for syllabifying English ARPABET transcriptions☆66Updated 6 years ago
- universal syllabification algorithms☆44Updated 2 years ago
- ☆19Updated 3 years ago
- ☆22Updated 3 years ago
- English web corpus with 4M tokens and several annotation types☆26Updated last year
- ipapy is a Python module to work with International Phonetic Alphabet (IPA) strings☆85Updated last year
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆30Updated 3 years ago
- A toolkit for producing n-gram language models. The highlights are the implementation of Kneser-Ney growing and revised Kneser pruning me…☆40Updated 8 months ago
- Gamma Agreement in Python☆43Updated last year
- fiwGAN/ciwGAN (Featural and Categorical InfoWaveGAN): Generative Adversarial Phonology and Semantics☆23Updated last year
- Calculates the Word Error Rate between two text files☆20Updated 2 years ago
- Labeled data for homograph disambiguation☆57Updated last year
- Unicode Standard tokenization routines and orthography profile segmentation☆37Updated 2 months ago
- ☆72Updated last month
- General-Purpose Neural Networks for Sentence Boundary Detection☆73Updated 2 years ago
- ☆10Updated 4 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated 3 months ago
- phone inventory library☆16Updated last year
- Multilingual grapheme-to-phoneme conversion☆20Updated 7 years ago
- CMU dictionary in IPA instead of their subset of Arpabet☆16Updated 7 months ago
- Forced Alignments for Common Voice☆31Updated 4 years ago
- A tool for automatic spelling normalization☆20Updated 4 years ago
- A phoneme-allophone database for many languages☆52Updated 4 years ago
- Repository for sharing the data in the Tamasheq language, one of the target languages for the low-resource speech translation track at IW…☆17Updated 2 years ago
- Breaks a word into syllables using an LSTM-based neural network.☆19Updated last year
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆75Updated last year
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated 2 years ago
- A simple neural truecaser written in pytorch and allennlp.☆33Updated 10 months ago
- Support tools for punctuation and boundary detection for ASR output.☆57Updated 2 years ago