CLARIN-PL / LEPISZCZE
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
☆13Updated last year
Alternatives and similar repositories for LEPISZCZE:
Users that are interested in LEPISZCZE are comparing it to the libraries listed below
- Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polis…☆36Updated last year
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆57Updated 9 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated 2 years ago
- A python package for benchmarking interpretability techniques on Transformers.☆213Updated 6 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆176Updated 2 months ago
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated last year
- Materials for "IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation" 🇮🇹☆30Updated 9 months ago
- Generalist and Lightweight Model for Text Classification☆92Updated this week
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆27Updated 6 months ago
- Pre-train Static Word Embeddings☆51Updated 3 weeks ago
- Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.☆26Updated last year
- Code associated with the paper "Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists"☆48Updated 2 years ago
- Bi-encoder entity linking architecture☆44Updated 6 months ago
- Completion After Prompt Probability. Make your LLM make a choice☆75Updated 5 months ago
- RaKUn 2.0 - A fast keyword detection algorithm☆66Updated last month
- A spaCy custom component that extracts and normalizes temporal expressions☆54Updated 2 years ago
- Source code and data for Like a Good Nearest Neighbor☆28Updated 2 months ago
- RATransformers 🐭- Make your transformer (like BERT, RoBERTa, GPT-2 and T5) Relation Aware!☆41Updated 2 years ago
- CLIR version of ColBERT☆67Updated 3 weeks ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆98Updated 2 weeks ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆56Updated 8 months ago
- The most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selec…☆16Updated last year
- This repository provides scripts for evaluating NLP models on the LEXTREME benchmark, a set of diverse multilingual tasks in legal NLP☆21Updated last year
- ☆65Updated last year
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆78Updated last year
- ☆67Updated 7 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- TimeLMs: Diachronic Language Models from Twitter☆108Updated last year
- Truly flash T5 realization!☆64Updated 10 months ago