AzBuki-ML / public-dataLinks
Custom-built Bulgarian language data sets, used by АзБуки.ML for sentiment analysis, text classification, summarisation and generation. Open-source & free to use in any ML project.
☆18Updated last year
Alternatives and similar repositories for public-data
Users that are interested in public-data are comparing it to the libraries listed below
Sorting:
- Linguistic Reconstruction with LingPy☆14Updated last year
- Curated list of Ukrainian natural language processing (NLP) resources (corpora, pretrained models, libriaries, etc.)☆214Updated 2 weeks ago
- Data powering ashtadhyayi.com☆52Updated last week
- Data for the quantitative study of (Vedic) Sanskrit☆135Updated last month
- LoanPy is a linguistic toolkit for rule-based prediction and evaluation of loanword adaptation and historical reconstructions and can be …☆16Updated last year
- LingPy: Python library for quantitative tasks in historical linguistics☆137Updated 2 months ago
- ☆18Updated 2 weeks ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆107Updated 2 weeks ago
- A general-purpose NLP pipeline for Ancient Greek☆23Updated last year
- ☆28Updated last year
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆31Updated 3 months ago
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆33Updated 3 years ago
- Collatinus Python Lemmatizer☆10Updated 4 years ago
- All languages stopwords collection☆458Updated last year
- Open German WordNet☆97Updated 2 weeks ago
- Machine-readable lists of lemma-token pairs in 23 languages.☆345Updated 3 years ago
- Latin BERT☆66Updated last year
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German☆502Updated 11 months ago
- Stemmer for German☆45Updated 3 years ago
- Extension for pie to include taggers with their models and pre/postprocessors☆10Updated last year
- Wiktionary dump file parser and multilingual data extractor☆1,015Updated this week
- A Parallel Russian-Simple Russian Dataset☆10Updated 2 years ago
- German part-of-speech dictionary☆45Updated 2 years ago
- Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Langua…☆38Updated 3 years ago
- Morphological analysis for Udmurt.☆12Updated last month
- Access to lexical databases☆142Updated 2 weeks ago
- ☆27Updated 7 months ago
- Python library for automatic analysis of Ancient Greek hexameter. The algorithm uses linguistic rules and finite-state technology.☆22Updated last year
- Access a database of word frequencies, in various natural languages.☆1,549Updated 9 months ago
- Ancient Greek language models for spaCy☆32Updated 6 months ago