MarsPanther / crawl-for-parallel-corpora
simple bs4 based web crawl for a corpus in need of statistical machine translation
☆13Updated 3 years ago
Alternatives and similar repositories for crawl-for-parallel-corpora:
Users that are interested in crawl-for-parallel-corpora are comparing it to the libraries listed below
- Morphological processing for languages of the Horn of Africa☆45Updated 2 months ago
- Amharic English Machine Translation Corpus prepared through website crawelling and custom preprocessing.☆42Updated 6 years ago
- Different semantic models for Amharic☆17Updated last year
- Lexical Data of Ge'ez Languages☆54Updated 2 years ago
- ☆15Updated 5 years ago
- Amharic/Tigrinya/Oromo Dictionaries☆38Updated last year
- A JavaScript-based converter for transliterating Amharic text into Latin characters☆19Updated 3 years ago
- Best Practices in Translation Memory Management☆45Updated 6 years ago
- ☆64Updated 10 months ago
- Morphological analysis and generation of Amharic, Oromo, and Tigrinya☆11Updated 8 years ago
- Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities☆12Updated last year
- Translation Memory Open-source Purifier☆34Updated 2 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated last year
- Machine Translation (MT) Preparation Scripts☆31Updated last month
- HORNMORPHO is a Python program that analyzes Amharic, Oromo, and Tigrinya words into their constituent morphemes (meaningful parts) and g…☆19Updated 7 years ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆77Updated 4 months ago
- List of research and engineering of NLP for American Native/Indigenous Languages.☆87Updated 4 years ago
- Hindi POS Tags and keywords using TNT model. Created Date: 28 Sept 2018☆25Updated 5 years ago
- Arabic Dialect Identification on AOC data.☆24Updated 6 years ago
- The set of files used for the development of the Amharic Corpus.☆11Updated 7 years ago
- Pre-trained Mongolian BERT models☆46Updated 4 years ago
- Benchmark Arabic text diacritization dataset☆74Updated 5 years ago
- Sentence aligner☆112Updated 3 years ago
- Bilingual term extractor☆53Updated last year
- Machine Translation (MT) Web Interface for OpenNMT and FairSeq models using CTranslate and Streamlit☆15Updated 3 years ago
- SIGTYP 2022 Shared Task☆9Updated 2 years ago
- Script for workflow to add morphological analysis into ELAN files☆13Updated 4 years ago
- Linguistically analyzed Classical Tibetan texts☆26Updated 3 years ago
- Open information and community for machine translation☆74Updated last week
- Useful resources for Mongolian NLP☆181Updated 3 months ago