sillsdev / silnlp
A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.
☆35Updated last week
Related projects ⓘ
Alternatives and complementary repositories for silnlp
- Curated corpus of parallel data derived from versions of the Bible provided by eBible.org.☆54Updated this week
- The Unicode Cookbook for Linguists☆53Updated 4 years ago
- Python API to access glottolog/glottolog☆28Updated 3 weeks ago
- The EveryVoice TTS Toolkit - Text To Speech for your language☆21Updated this week
- ☆19Updated 3 years ago
- ipapy is a Python module to work with International Phonetic Alphabet (IPA) strings☆81Updated 6 months ago
- CLDF: Cross-Linguistic Data Formats - the specification☆55Updated 7 months ago
- A multilingual parallel corpus created from translations of the Bible.☆176Updated 2 months ago
- Massively multilingual pronunciation mining☆321Updated this week
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆36Updated last year
- Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.☆221Updated 3 months ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆27Updated 3 years ago
- Unicode Standard tokenization routines and orthography profile segmentation☆33Updated 2 years ago
- Audiobook alignment for Indigenous languages☆38Updated this week
- Python Finite-State Toolkit☆45Updated last week
- Universal Romanizer that can convert any unicode script to roman (latin) script☆154Updated 3 months ago
- Perseus Treebank Data☆70Updated 5 months ago
- Interlinear glossing with JS & CSS☆18Updated 9 years ago
- A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars☆34Updated last month
- Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!☆135Updated this week
- SegBo: A database of borrowed sounds in the world’s languages☆16Updated 8 months ago
- Yet another search platform for linguistic corpora.☆19Updated 4 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆48Updated 2 months ago
- Script for workflow to add morphological analysis into ELAN files☆13Updated 4 years ago
- FieldWorks is a suite of software tools for language and cultural data, with support for complex scripts.☆83Updated this week
- Tools and scripts for working with ELAN☆10Updated 2 years ago
- PHOIBLE Online☆42Updated 2 years ago
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆27Updated 5 years ago
- Cog is a tool for comparing languages using lexicostatistics and comparative linguistics techniques.☆23Updated last year
- SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/☆56Updated last year