MarsPanther / crawl-for-parallel-corpora
simple bs4 based web crawl for a corpus in need of statistical machine translation
☆13Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for crawl-for-parallel-corpora
- Morphological processing for languages of the Horn of Africa☆41Updated this week
- ☆15Updated 4 years ago
- A JavaScript-based converter for transliterating Amharic text into Latin characters☆19Updated 2 years ago
- Different semantic models for Amharic☆17Updated 10 months ago
- Amharic English Machine Translation Corpus prepared through website crawelling and custom preprocessing.☆39Updated 6 years ago
- Resources to go with the Indic NLP Library☆72Updated 2 years ago
- ☆63Updated 6 months ago
- The set of files used for the development of the Amharic Corpus.☆11Updated 7 years ago
- Morphological analysis and generation of Amharic, Oromo, and Tigrinya☆11Updated 7 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated last year
- Translation Memory Open-source Purifier☆33Updated 2 years ago
- A small python script that transliterates Arabic text using the Buckwalter Transliteration Scheme. It allows for multiple decisions to be…☆27Updated 10 years ago
- Material for the Text Analysis of Arabic course taught at the NYU Abu Dhabi Winter Institute in Digital Humanities 2020.☆12Updated 4 years ago
- Xlit-Crowd: Hindi-English Transliteration Corpus☆37Updated 9 years ago
- Indian Language Tagger and Chunker (Hindi, Telugu, Tamil, Marathi, Punjabi, Kanada, Malayalam, Urdu, Bengali)☆40Updated last year
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆40Updated 2 years ago
- ElixirFM Functional Arabic Morphology☆43Updated last year
- Hindi POS Tags and keywords using TNT model. Created Date: 28 Sept 2018☆24Updated 4 years ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆70Updated this week
- A rule-based iterative affix stripping stemmer for Tamil☆43Updated 6 years ago
- Arabic Transliteration in Python☆33Updated 11 years ago
- Arabic Dialects Segmenter Using Keras/BiLSTM/ChainCRF☆10Updated 4 years ago
- A Python based API to access Indian language WordNets.☆37Updated 2 years ago
- Crawler for linguistic corpora☆192Updated 11 months ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆36Updated last year
- ☆43Updated 9 years ago
- Pre-process arabic text (remove diacritics, punctuations and repeating characters)☆105Updated 7 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆27Updated 3 years ago
- ☆14Updated 2 years ago
- A collection of basic text processing modules focused on Gujarati☆10Updated 7 years ago