slovak-nlp / resources
A curated list of resources such as tools and datasets useful for the processing of Slovak language
☆19Updated 2 months ago
Alternatives and similar repositories for resources:
Users that are interested in resources are comparing it to the libraries listed below
- ☆19Updated last year
- German Alpaca Dataset (Cleaned + Translated)☆23Updated last year
- Interesting links to Slovak NLP tools, utils corpuses and resources.☆16Updated 3 years ago
- Live survey of off-the-shelf language identification tools for python☆26Updated 2 years ago
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German☆456Updated 2 months ago
- A french sequence to sequence pretrained model☆57Updated 2 years ago
- This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings l…☆22Updated 2 years ago
- TSAR2022 Shared Task on Lexical Simplification - Datasets and Evaluation scripts☆10Updated 2 years ago
- Wikipedia text corpus for self-supervised NLP model training☆41Updated 2 years ago
- A Scandinavian Benchmark for sentence embeddings☆31Updated 3 weeks ago
- This is a neural spell checker☆62Updated 2 years ago
- Efficient Attention for Long Sequence Processing☆91Updated last year
- ☆44Updated 5 months ago
- A Python library for calculating a large variety of metrics from text☆320Updated last month
- Repository containing the code for training the CroissantLLM☆21Updated 11 months ago
- Evaluate language models using multiple choice items☆12Updated last week
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …☆327Updated last year
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆66Updated last year
- A tokenizer and sentence splitter for German and English web and social media texts.☆137Updated last month
- Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13☆166Updated 2 months ago
- ☆153Updated 7 months ago
- SpanMarker for Named Entity Recognition☆412Updated last week
- Neural Machine Translation (NMT) tutorial. Data preprocessing, model training, evaluation, and deployment.☆155Updated 9 months ago
- Clustering sentence embeddings to extract message intent☆169Updated 3 years ago
- A multilingual version of MS MARCO passage ranking dataset☆143Updated last year
- Multilingual sentence alignment using sentence embeddings☆106Updated 2 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆111Updated last week
- Annotation Tool for Text Simplification Corpora☆16Updated last year
- cLang-8 is a dataset for grammatical error correction.☆104Updated 2 years ago
- Some notebooks for NLP☆189Updated last year