MarsPanther / crawl-for-parallel-corporaLinks
simple bs4 based web crawl for a corpus in need of statistical machine translation
☆13Updated 3 years ago
Alternatives and similar repositories for crawl-for-parallel-corpora
Users that are interested in crawl-for-parallel-corpora are comparing it to the libraries listed below
Sorting:
- Best Practices in Translation Memory Management☆46Updated 6 years ago
- Morphological analysis and generation of Amharic, Oromo, and Tigrinya☆11Updated 8 years ago
- A JavaScript-based converter for transliterating Amharic text into Latin characters☆19Updated 3 years ago
- The set of files used for the development of the Amharic Corpus.☆11Updated 8 years ago
- Morphological processing for languages of the Horn of Africa☆46Updated this week
- A Python based API to access Indian language WordNets.☆38Updated 3 years ago
- Python library to use Google Transliterate API which powers the G Input Tools☆22Updated 4 years ago
- Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check: https://github.com/AI4Bharat/…☆60Updated 4 years ago
- Translation Memory Open-source Purifier☆34Updated 2 years ago
- Sanskrit compound segmentation using seq2seq model☆25Updated 6 years ago
- Python package for indic script transliteration☆189Updated last week
- Natural Language Processing Tutorials(NLP) with Julia and Python☆243Updated 5 months ago
- Pre-process arabic text (remove diacritics, punctuations and repeating characters)☆107Updated 8 years ago
- Machine Translation for Africa☆289Updated 3 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆40Updated 2 years ago
- ThamizhiMorph: A Tamil Morphological Analyser and Generator☆18Updated last year
- ElixirFM Functional Arabic Morphology☆44Updated 2 years ago
- Machine Translation from Sanskrit to Hindi using Unsupervised and Supervised Learning☆19Updated 4 years ago
- HORNMORPHO is a Python program that analyzes Amharic, Oromo, and Tigrinya words into their constituent morphemes (meaningful parts) and g…☆20Updated 7 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated 2 years ago
- 3000+ machine-readable open source dictionaries distributed by the Applied Computational Linguistics lab at the University of Augsburg, G…☆13Updated 2 years ago
- Generating multiple choice questions from text using Machine Learning.☆489Updated last year
- Rasa for Beginners☆60Updated 4 years ago
- Resources to go with the Indic NLP Library☆73Updated 3 years ago
- A project to collect all tamil nouns☆11Updated 7 months ago
- Using NLP and LDA for Topic Modeling and Sentiment Analysis☆43Updated 4 years ago
- ☆159Updated 2 years ago
- Transliterating English to Hindi using Recurrent Neural Networks☆45Updated 8 years ago
- A Machine Learning project to translate Sanskrit text to English☆50Updated 7 years ago
- Linguistically analyzed Classical Tibetan texts☆26Updated 4 years ago