MarsPanther / crawl-for-parallel-corpora
simple bs4 based web crawl for a corpus in need of statistical machine translation
☆13Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for crawl-for-parallel-corpora
- Morphological processing for languages of the Horn of Africa☆40Updated last week
- A JavaScript-based converter for transliterating Amharic text into Latin characters☆19Updated 2 years ago
- Amharic/Tigrinya/Oromo Dictionaries☆37Updated last year
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated last year
- The set of files used for the development of the Amharic Corpus.☆11Updated 7 years ago
- Different semantic models for Amharic☆17Updated 9 months ago
- ☆15Updated 4 years ago
- Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Plains Cree language☆15Updated this week
- Amharic English Machine Translation Corpus prepared through website crawelling and custom preprocessing.☆39Updated 6 years ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆91Updated 6 months ago
- Lexical Data of Ge'ez Languages☆51Updated 2 years ago
- Morphological analysis and generation of Amharic, Oromo, and Tigrinya☆11Updated 7 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆40Updated 2 years ago
- An NLP pipeline for Hebrew☆34Updated 7 months ago
- HORNMORPHO is a Python program that analyzes Amharic, Oromo, and Tigrinya words into their constituent morphemes (meaningful parts) and g…☆19Updated 6 years ago
- Audiobook alignment for Indigenous languages☆37Updated this week
- ☆63Updated 5 months ago
- Aksharamukha Python Library☆43Updated 3 weeks ago
- Tools for assessing Finnish poetry: rhymes, meter, hyphenation of Finnish and so on.☆11Updated 11 months ago
- Android software for recording and translation☆30Updated 8 years ago
- Python API to access glottolog/glottolog☆28Updated 2 weeks ago
- SIGTYP 2022 Shared Task☆9Updated 2 years ago
- LoanPy is a linguistic toolkit for rule-based prediction and evaluation of loanword adaptation and historical reconstructions and can be …☆15Updated 8 months ago
- ☆23Updated 3 years ago
- ☆48Updated 2 years ago
- Check the grammar of a given sentence using BERT and ULMFIT.☆15Updated 3 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆27Updated 3 years ago
- Repository for Philippine language dictionary data☆19Updated last year
- Named entity relevant project☆30Updated 4 years ago
- Official releases of the PROIEL treebank of ancient Indo-European languages☆36Updated last year