MarsPanther / crawl-for-parallel-corpora
simple bs4 based web crawl for a corpus in need of statistical machine translation
☆13Updated 3 years ago
Alternatives and similar repositories for crawl-for-parallel-corpora:
Users that are interested in crawl-for-parallel-corpora are comparing it to the libraries listed below
- Morphological processing for languages of the Horn of Africa☆43Updated last week
- Different semantic models for Amharic☆17Updated last year
- Amharic/Tigrinya/Oromo Dictionaries☆38Updated last year
- ☆15Updated 5 years ago
- A JavaScript-based converter for transliterating Amharic text into Latin characters☆19Updated 2 years ago
- Lexical Data of Ge'ez Languages☆52Updated 2 years ago
- Amharic English Machine Translation Corpus prepared through website crawelling and custom preprocessing.☆40Updated 6 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆39Updated 2 years ago
- Linguistically analyzed Classical Tibetan texts☆26Updated 3 years ago
- Best Practices in Translation Memory Management☆45Updated 6 years ago
- Morphological analysis and generation of Amharic, Oromo, and Tigrinya☆11Updated 7 years ago
- Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities☆11Updated last year
- The set of files used for the development of the Amharic Corpus.☆11Updated 7 years ago
- A Python based API to access Indian language WordNets.☆37Updated 2 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated last year
- Translation Memory Open-source Purifier☆33Updated 2 years ago
- SIGTYP 2022 Shared Task☆9Updated 2 years ago
- ☆12Updated 2 years ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆27Updated 3 years ago
- ☆63Updated 8 months ago
- 🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python☆60Updated last week
- Bilingual term extractor☆53Updated last year
- 😎 Curated list of Tibetan NLP projects☆36Updated 4 years ago
- A toolset for Amharic Language pre-processing. Includes an Amharic Stemmer, Transliterator, Stopword remover , Lexical analyzer, Corpus i…☆33Updated last year
- Hunspell files for Tibetan☆22Updated 9 years ago
- HORNMORPHO is a Python program that analyzes Amharic, Oromo, and Tigrinya words into their constituent morphemes (meaningful parts) and g…☆19Updated 7 years ago
- SIGMORPHON 2022 Shared Task on Morpheme Segmentation☆24Updated last year
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆39Updated last year
- A library for generating Ethiopic fake data such as names, addresses, and phone numbers☆16Updated 6 years ago
- A repository for the 2022 Inflection Shared Task☆9Updated 2 years ago