allegro / klejbenchmark-baselinesLinks
Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.
☆26Updated 2 years ago
Alternatives and similar repositories for klejbenchmark-baselines
Users that are interested in klejbenchmark-baselines are comparing it to the libraries listed below
Sorting:
- RoBERTa models for Polish☆87Updated 3 years ago
- We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically …☆184Updated 3 years ago
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.☆105Updated 3 years ago
- This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' pu…☆41Updated 3 years ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated 2 years ago
- [EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction☆119Updated 3 years ago
- ☆86Updated 4 months ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆47Updated 2 years ago
- [EMNLP-Findings 2020] Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences☆63Updated last year
- Open source library for few shot NLP☆78Updated 2 years ago
- Annotated corpus + evaluation metrics for text anonymisation☆60Updated 2 weeks ago
- Polish BERT☆72Updated 4 years ago
- XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale☆155Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆156Updated last year
- A multilingual version of MS MARCO passage ranking dataset☆144Updated last year
- Experiments for XLM-V Transformers Integeration☆13Updated 2 years ago
- This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences fro…☆160Updated 10 months ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 q…☆88Updated last year
- Shared BERT model for 4 languages of Bulgarian, Czech, Polish and Russian. Slavic NER model.☆76Updated 3 years ago
- Wikipedia text corpus for self-supervised NLP model training☆44Updated 3 years ago
- SQuARE: Software for question answering research.☆75Updated last year
- Segment documents into coherent parts using word embeddings.☆149Updated 3 years ago
- This repository contains the code for "Generating Datasets with Pretrained Language Models".☆188Updated 3 years ago
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated 2 years ago
- Code for the CRAC 2021 paper "On Generalization in Coreference Resolution" (Best short paper award)☆35Updated 2 years ago
- Tool for named entity recognition for Polish based on deep learning.☆31Updated 2 years ago
- Official implementation of the paper "CoEdIT: Text Editing by Task-Specific Instruction Tuning" (EMNLP 2023)☆129Updated 10 months ago
- Stanford's Alexa Prize socialbot☆133Updated last year
- Label data using HuggingFace's transformers and automatically get a prediction service☆190Updated 2 years ago
- This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish☆13Updated last year