AndyTheFactory / romanian-nlp-datasets
A list of Romanian NLP Datasets
☆31Updated last month
Related projects ⓘ
Alternatives and complementary repositories for romanian-nlp-datasets
- This repo is the home of Romanian Transformers.☆93Updated 2 years ago
- A list of Natural Language Processing Tools for Romanian☆24Updated 3 years ago
- A novel dataset for emotion detection from Romanian text.☆15Updated 3 weeks ago
- A collection of preprocessed datasets and pretrained models for generating paraphrases.☆29Updated 3 years ago
- Romanian WordNet (Data + API for Python)☆49Updated 4 years ago
- Romanian Semantic Textual Similarity Dataset☆15Updated 2 years ago
- A lightweight Python library for constructing, processing, and visualizing constituent trees.☆63Updated 2 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆92Updated this week
- Interpretability for sequence generation models 🐛 🔍☆377Updated last week
- A python package for benchmarking interpretability techniques on Transformers.☆212Updated last month
- Romanian Named Entity Corpus (RONEC) version 2.0☆60Updated 2 years ago
- Evaluation of language models on mono- or multilingual tasks.☆75Updated last week
- ☆22Updated last year
- Clustering sentence embeddings to extract message intent☆167Updated 3 years ago
- The multilingual language model for Switzerland☆25Updated 10 months ago
- BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them t…☆127Updated 5 months ago
- IndicGenBench is a high-quality, multilingual, multi-way parallel benchmark for evaluating Large Language Models (LLMs) on 4 user-facing …☆42Updated 2 months ago
- ☆28Updated last month
- ☆147Updated 5 months ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆92Updated 3 weeks ago
- This repository contains the code for "Generating Datasets with Pretrained Language Models".☆187Updated 3 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆103Updated 6 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 5 months ago
- SpanMarker for Named Entity Recognition☆403Updated 3 months ago
- Experiments for data quality in Rasa.☆34Updated 2 years ago
- Repository for paper Decrypting Cryptic Crosswords☆9Updated 2 years ago
- Some notebooks for NLP☆188Updated last year
- A fully customisable language detection pipeline for spaCy☆93Updated 5 years ago
- A module to compute textual lexical richness (aka lexical diversity).☆92Updated last year
- A python package to simulate typographical errors.☆31Updated 11 months ago