MarsPanther / crawl-for-parallel-corporaLinks
simple bs4 based web crawl for a corpus in need of statistical machine translation
☆13Updated 4 years ago
Alternatives and similar repositories for crawl-for-parallel-corpora
Users that are interested in crawl-for-parallel-corpora are comparing it to the libraries listed below
Sorting:
- Morphological analysis for Udmurt.☆12Updated 3 months ago
- A JavaScript-based converter for transliterating Amharic text into Latin characters☆19Updated 4 years ago
- Morphological analysis and generation of Amharic, Oromo, and Tigrinya☆11Updated 8 years ago
- The set of files used for the development of the Amharic Corpus.☆11Updated 8 years ago
- Data for the quantitative study of (Vedic) Sanskrit☆144Updated 5 months ago
- Best Practices in Translation Memory Management☆47Updated 7 years ago
- ☆12Updated 3 years ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆90Updated 3 months ago
- Sanskrit compound segmentation using seq2seq model☆26Updated 7 years ago
- ☆67Updated 5 months ago
- A Machine Learning project to translate Sanskrit text to English☆51Updated 7 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆35Updated 2 years ago
- Spanish Billion Word Corpus and Embeddings☆52Updated 3 years ago
- Amharic/Tigrinya/Oromo Dictionaries☆38Updated last week
- Morphological processing for languages of the Horn of Africa☆54Updated last month
- Linguistically analyzed Classical Tibetan texts☆28Updated 4 years ago
- Shami Dialect Corpus (SDC)☆29Updated 8 years ago
- Hotels Arabic-Reviews Dataset☆33Updated 7 years ago
- ☆16Updated 6 years ago
- Scripts for compatibilitising between VISL-CG3, Apertium, CoNLL-X and Universal Dependencies☆17Updated 5 years ago
- Natural Language Processing Tutorials(NLP) with Julia and Python☆247Updated last year
- ElixirFM Functional Arabic Morphology☆45Updated 2 years ago
- Crawler for linguistic corpora☆213Updated 5 months ago
- Morphological analyzer and lemmatizer for Latin.☆28Updated 2 months ago
- Fast corpus search engine originally made for the Corpus of Written Tatar language☆17Updated 6 years ago
- Python library to use Google Transliterate API which powers the G Input Tools☆22Updated 4 years ago
- Sentiment Analysis for Arabic Text (tweets, reviews, and standard Arabic) using word2vec☆95Updated last year
- The complete [1 to 5]-gram Gumar Corpus in the style of Google n-grams.☆11Updated 6 years ago
- A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more …☆115Updated last year
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆53Updated 2 years ago