sillsdev / silnlpLinks
A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.
☆36Updated this week
Alternatives and similar repositories for silnlp
Users that are interested in silnlp are comparing it to the libraries listed below
Sorting:
- Curated corpus of parallel data derived from versions of the Bible provided by eBible.org.☆77Updated 5 months ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆51Updated 2 years ago
- 🙊 software for creating speech recognition models.☆159Updated last year
- Audiobook alignment for Indigenous languages☆42Updated this week
- A multilingual parallel corpus created from translations of the Bible.☆190Updated 5 months ago
- Character-level conversion between Hebrew text and Latin transliteration using deep learning - a demonstration of seq2seq training.☆14Updated 2 years ago
- The Unicode Cookbook for Linguists☆56Updated 4 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆31Updated 4 months ago
- SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/☆57Updated 2 months ago
- Jason Riggle's chart of phonological features in JSON format + extras☆54Updated last year
- Massively multilingual pronunciation mining☆354Updated 2 months ago
- ipapy is a Python module to work with International Phonetic Alphabet (IPA) strings☆89Updated last year
- Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.☆281Updated last week
- Python Finite-State Toolkit☆58Updated 2 weeks ago
- Python API to access glottolog/glottolog☆31Updated 4 months ago
- An NLP pipeline for Hebrew☆39Updated 4 months ago
- File format, model, API, and apps for manipulating text and its annotated features☆75Updated this week
- Unicode Standard tokenization routines and orthography profile segmentation☆37Updated 8 months ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆51Updated 2 years ago
- Cross-Linguistic Transcription Systems☆16Updated 10 months ago
- PHOIBLE data and development.☆136Updated last year
- The EveryVoice TTS Toolkit - Text To Speech for your language☆41Updated this week
- Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.☆287Updated 7 months ago
- Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Plains Cree language☆16Updated last week
- ☆29Updated last year
- Multilingual sentence alignment using sentence embeddings☆128Updated 11 months ago
- Small-vocabulary neural sequence-to-sequence generation with optional feature conditioning☆34Updated last week
- This packages up data for the Open Multilingual Wordnet☆55Updated 5 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆52Updated last month
- A corpus of diacritized Hebrew texts (טקסט מנוקד)☆11Updated 3 years ago