CLARIN-PL / LEPISZCZE
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
☆13Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for LEPISZCZE
- Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polis…☆36Updated 11 months ago
- Bi-encoder entity linking architecture☆42Updated 2 months ago
- LTG-Bert☆29Updated 10 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆65Updated last year
- Simple-to-use scoring function for arbitrarily tokenized texts.☆32Updated 3 weeks ago
- A Python library aimed at dissecting and augmenting NER training data.☆56Updated last year
- RaKUn 2.0 - A fast keyword detection algorithm☆65Updated 3 months ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆56Updated 5 months ago
- Generalist and Lightweight Model for Text Classification☆51Updated last week
- 🕸️ A graph-augmented dense statute retriever. (EACL 2023)☆19Updated last year
- ☆46Updated 9 months ago
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- Source code and data for Like a Good Nearest Neighbor☆28Updated 9 months ago
- Are foundation LMs multilingual knowledge bases? (EMNLP 2023)☆18Updated 11 months ago
- A weak supervision framework for (partial) labeling functions☆14Updated 4 months ago
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆23Updated 3 months ago
- This repository provides scripts for evaluating NLP models on the LEXTREME benchmark, a set of diverse multilingual tasks in legal NLP☆20Updated 10 months ago
- ☆82Updated 6 months ago
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆21Updated last month
- Norwegian question answering dataset☆13Updated 9 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆92Updated last year
- A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently…☆106Updated 2 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- Embedding Recycling for Language models☆38Updated last year
- Modalities, a PyTorch-native framework for distributed and reproducible foundation model training.☆64Updated this week
- Materials for "IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation" 🇮🇹☆30Updated 5 months ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆90Updated 3 weeks ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki☆22Updated this week
- Experiments for XLM-V Transformers Integeration☆13Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆90Updated 8 months ago