AzBuki-ML / public-data
Custom-built Bulgarian language data sets, used by АзБуки.ML for sentiment analysis, text classification, summarisation and generation. Open-source & free to use in any ML project.
☆18Updated last year
Alternatives and similar repositories for public-data:
Users that are interested in public-data are comparing it to the libraries listed below
- Collection and resources for Bulgarian Corpus, Datasets and Models used in ASR, TTS or NLP tasks together with the links of corresponding…☆24Updated 4 years ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆97Updated 3 weeks ago
- Public repository for Coptic SCRIPTORIUM Corpora Releases☆34Updated 3 months ago
- A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Sp…☆29Updated 3 years ago
- Collected files from thelatinlibrary.com☆21Updated 5 years ago
- Ancient Greek lemmatisation tool☆22Updated 3 years ago
- Hunspell-based analysis for Elasticsearch☆79Updated last month
- German part-of-speech dictionary☆44Updated last year
- Python Teaching, Seminars for 2nd year students of School of Linguistics NRU HSE☆29Updated 7 years ago
- The curation repository for the data behind Concepticon.☆38Updated last month
- eXtensible Interlinear Glossed Text☆32Updated 2 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- Bulgarian wordlists (списък с думи на Български език)☆86Updated 2 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆154Updated 4 months ago
- A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the dat…☆150Updated 3 months ago
- Python Unicode Block Utilities☆24Updated 4 years ago
- This repository contains code behind the visualization of the Wikimedia tool etytree at http://tools.wmflabs.org/etytree/☆51Updated 5 years ago
- Official releases of the TOROT treebank☆9Updated 5 years ago
- 110k Dutch Book Reviews Dataset for Sentiment Analysis☆29Updated last year
- A list of vocabulary lists☆21Updated 4 years ago
- Morphological Dictionaries for German Language☆28Updated 6 years ago
- Offline bilingual dictionaries made using data from Wiktionary☆53Updated 9 years ago
- Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an…☆18Updated 4 months ago
- Lexical data at Unicode☆68Updated 7 months ago
- Extract data from German Wiktionary XML files.☆26Updated 3 months ago
- Grammar rules and dictionaries for the phonetic transcription of Russian sentences☆33Updated 3 years ago
- "Fundamentals of Computer Programming with C#" Book☆13Updated 5 years ago
- ☆18Updated this week
- Deutsches Lyrik Korpus (DLK) / German Poetry Corpus☆18Updated 10 months ago
- A draughts (checkers) library for Python with move generation, PDN reading and writing, engine communication and balloted openings☆17Updated 2 months ago