AzBuki-ML / public-dataLinks
Custom-built Bulgarian language data sets, used by АзБуки.ML for sentiment analysis, text classification, summarisation and generation. Open-source & free to use in any ML project.
☆18Updated last year
Alternatives and similar repositories for public-data
Users that are interested in public-data are comparing it to the libraries listed below
Sorting:
- Romanian Named Entity Corpus (RONEC) version 2.0☆65Updated 2 years ago
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆31Updated 3 years ago
- A Scandinavian Benchmark for sentence embeddings☆40Updated 3 months ago
- This repo is the home of Romanian Transformers.☆105Updated 2 years ago
- Machine-readable lists of lemma-token pairs in 23 languages.☆342Updated 3 years ago
- Pre-trained models and language resources for Natural Language Processing in Polish☆351Updated last year
- Norwegian Transformer Model☆116Updated 9 months ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆106Updated last week
- Curated list of Ukrainian natural language processing (NLP) resources (corpora, pretrained models, libriaries, etc.)☆210Updated last month
- A lexical normalizer for historical spelling variants using a transformer architecture.☆10Updated 5 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆171Updated 2 months ago
- State-of-the-art count-based word embeddings for low-resource languages with a special focus on historical languages.☆11Updated 5 months ago
- The website for Danish Foundation Models, a project for training foundational Danish language model.☆74Updated last week
- The robust European language model benchmark.☆120Updated this week
- Open German WordNet☆96Updated last year
- Linguistic Reconstruction with LingPy☆14Updated last year
- A Python Wiktionary Parser☆363Updated last month
- XML files for the works in the First Thousand Years of Greek Project. Please see our Wiki on how to contribute.☆99Updated 3 weeks ago
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German☆492Updated 10 months ago
- ☆28Updated 11 months ago
- Data for the quantitative study of (Vedic) Sanskrit☆128Updated last week
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆82Updated 3 months ago
- Data for the International Phonetic Alphabet (IPA)☆33Updated 2 years ago
- A character-wise tokenizer for morphologically rich languages☆28Updated 5 months ago
- A curated list of resources such as tools and datasets useful for the processing of Slovak language☆22Updated last month
- A program that sets the stress and the letter ё of Russian text and ebooks using Wiktionary data and grammar analysis.☆29Updated last year
- Здесь собирается каталог ссылок на полезные языковые ресурсы башкирского языка☆14Updated last year
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆47Updated 2 years ago
- Python Finite-State Toolkit☆58Updated last week
- Curated corpus of parallel data derived from versions of the Bible provided by eBible.org.☆72Updated 3 months ago