CLARIN-PL / LEPISZCZE
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
☆13Updated 9 months ago
Related projects: ⓘ
- Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polis…☆36Updated 9 months ago
- LTG-Bert☆25Updated 8 months ago
- Simple-to-use scoring function for arbitrarily tokenized texts.☆27Updated last week
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- Bi-encoder entity linking architecture☆40Updated last week
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated last year
- Truly flash T5 realization!☆48Updated 4 months ago
- 🔗 A graph-augmented dense statute retriever. (EACL 2023)☆17Updated 11 months ago
- Generalist and Lightweight Model for Text Classification☆29Updated 2 weeks ago
- Entailment self-training☆26Updated last year
- triple-encoders is a library for contextualizing distributed Sentence Transformers representations.☆12Updated 2 weeks ago
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆77Updated this week
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆42Updated 10 months ago
- Repository for "Attribute First, then Generate: Locally-attributable Grounded Text Generation", ACL 2024☆25Updated 5 months ago
- Embedding Recycling for Language models☆38Updated last year
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆51Updated 3 months ago
- ☆19Updated last year
- This repository provides scripts for evaluating NLP models on the LEXTREME benchmark, a set of diverse multilingual tasks in legal NLP☆19Updated 8 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆22Updated 5 months ago
- Experiments for XLM-V Transformers Integeration☆13Updated last year
- Multi-task model for named-entity recognition, relation extraction, entity mention detection and coreference resolution.☆42Updated 2 months ago
- RaKUn 2.0 - A fast keyword detection algorithm☆61Updated last month
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆91Updated last year
- ☆56Updated 9 months ago
- Code for SaGe subword tokenizer (EACL 2023)☆21Updated this week
- Source code and data for Like a Good Nearest Neighbor☆28Updated 7 months ago
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆21Updated last month
- Dataset for cross-lingual legal text summarization from EUR-Lex document summaries☆12Updated 6 months ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆11Updated 9 months ago
- ☆78Updated 4 months ago