MarsPanther / crawl-for-parallel-corpora
simple bs4 based web crawl for a corpus in need of statistical machine translation
☆13Updated 3 years ago
Alternatives and similar repositories for crawl-for-parallel-corpora
Users that are interested in crawl-for-parallel-corpora are comparing it to the libraries listed below
Sorting:
- Morphological processing for languages of the Horn of Africa☆45Updated 3 months ago
- ☆15Updated 5 years ago
- A JavaScript-based converter for transliterating Amharic text into Latin characters☆19Updated 3 years ago
- Morphological analysis and generation of Amharic, Oromo, and Tigrinya☆11Updated 8 years ago
- Different semantic models for Amharic☆19Updated last year
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆80Updated 5 months ago
- The set of files used for the development of the Amharic Corpus.☆11Updated 7 years ago
- Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities☆13Updated 2 years ago
- Translation Memory Open-source Purifier☆34Updated 2 years ago
- Best Practices in Translation Memory Management☆45Updated 6 years ago
- Amharic English Machine Translation Corpus prepared through website crawelling and custom preprocessing.☆43Updated 6 years ago
- ☆64Updated last year
- A small python script that transliterates Arabic text using the Buckwalter Transliteration Scheme. It allows for multiple decisions to be…☆26Updated 11 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆39Updated 2 years ago
- Resources to go with the Indic NLP Library☆73Updated 2 years ago
- Youtube comments topics modeling and sentiment analyzer☆16Updated 2 years ago
- Arabic named entity recognition using AnerCorp corpus (location , organisation, person, Miscellaneous Word)☆37Updated 7 years ago
- ☆43Updated 9 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated 2 years ago
- Pre-process arabic text (remove diacritics, punctuations and repeating characters)☆106Updated 8 years ago
- Hotels Arabic-Reviews Dataset☆32Updated 6 years ago
- This buckwalter2unicode script is designed to convert Arabic text that has been transliterated to ASCII symbols using the Buckwalter Tran…☆13Updated 12 years ago
- A Python based API to access Indian language WordNets.☆39Updated 3 years ago
- BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique☆27Updated 4 years ago
- Indian Language Tagger and Chunker (Hindi, Telugu, Tamil, Marathi, Punjabi, Kanada, Malayalam, Urdu, Bengali)☆42Updated 2 years ago
- ☆14Updated 4 years ago
- Tools to normalise and derive sentiment from Arabic text☆28Updated 7 years ago
- Benchmark Arabic text diacritization dataset☆75Updated 5 years ago
- sentiment analysis models for Arabic tweets to analyze Twitter comments as having positive, negative or neutral sentiments.☆13Updated 7 years ago
- Code and models for "The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models". EACL 2021, WANLP.☆46Updated 10 months ago