sillsdev / silnlp
A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.
☆30Updated this week
Related projects: ⓘ
- Curated corpus of parallel data derived from versions of the Bible provided by eBible.org.☆51Updated last month
- ☆19Updated 2 years ago
- SIGMORPHON 2022 Shared Task on Morpheme Segmentation☆23Updated last year
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆42Updated last year
- Small-vocabulary sequence-to-sequence generation with optional feature conditioning☆29Updated last week
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆36Updated last year
- 🙊 software for creating speech recognition models.☆152Updated 3 months ago
- Python API to access glottolog/glottolog☆27Updated 6 months ago
- Audiobook alignment for Indigenous languages☆34Updated this week
- ☆26Updated 3 months ago
- Python Finite-State Toolkit☆39Updated last month
- Script for workflow to add morphological analysis into ELAN files☆13Updated 4 years ago
- A lexicon compiler for non-suffixational morphologies☆11Updated 2 months ago
- Yet another search platform for linguistic corpora.☆19Updated 2 months ago
- LoanPy is a linguistic toolkit for rule-based prediction and evaluation of loanword adaptation and historical reconstructions and can be …☆15Updated 6 months ago
- Unicode Standard tokenization routines and orthography profile segmentation☆31Updated 2 years ago
- A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars☆32Updated 6 months ago
- A multilingual parallel corpus created from translations of the Bible.☆172Updated 3 months ago
- The Unicode Cookbook for Linguists☆53Updated 3 years ago
- A repository for the 2022 Inflection Shared Task☆9Updated 2 years ago
- A character-wise tokenizer for morphologically rich languages☆27Updated 3 months ago
- ☆67Updated last month
- Efficient Low-Memory Aligner☆135Updated 2 weeks ago
- Cog is a tool for comparing languages using lexicostatistics and comparative linguistics techniques.☆22Updated 11 months ago
- ipapy is a Python module to work with International Phonetic Alphabet (IPA) strings☆81Updated 4 months ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆27Updated 3 years ago
- python package to read and write CLDF datasets☆15Updated last week
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆45Updated 2 weeks ago
- Massively multilingual pronunciation mining☆315Updated 2 weeks ago
- Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.☆213Updated last month