Toloka / crowd-kit
Control the quality of your labeled data with the Python tools you already know.
☆211Updated last week
Related projects: ⓘ
- Toloka-Kit is a Python library for working with Toloka API.☆200Updated 2 months ago
- BSNLP 2021☆32Updated 2 years ago
- Interface for easier topic modelling.☆140Updated last month
- Question answering on russian with XLMRobertaLarge as a service☆21Updated 2 years ago
- Tools for shrinking fastText models (in gensim format)☆168Updated 4 months ago
- A small library with distillation, quantization and pruning pipelines☆26Updated 3 years ago
- A repository for Toloka tools.☆13Updated 3 months ago
- A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision☆90Updated 2 years ago
- Pipeline for fast building text classification TF-IDF + LogReg baselines.☆63Updated 2 years ago
- Augmentex — a library for augmenting texts with errors☆48Updated 2 months ago
- RUSSE 2022: Russian Text Detoxification Based on Parallel Corpora☆20Updated 2 years ago
- Infrastructure for starting TG bot project. Postgres, Minio, Grafana, Alembic☆21Updated 2 years ago
- A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision☆215Updated last month
- Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке☆30Updated 2 years ago
- nlp workshop at datafest siberia 2019☆22Updated last year
- Active learning☆79Updated last year
- RuSimpleSentEval (RSSE) shared task repo☆21Updated 3 years ago
- 2nd place solution for Next Like prediction task☆52Updated last year
- Code and data of "Methods for Detoxification of Texts for the Russian Language" paper☆45Updated 3 weeks ago
- Probing suite for evaluation of Russian embedding and language models☆32Updated 2 years ago
- Russian Corpus of Linguistic Acceptability☆40Updated last year
- SAGE: Spelling correction, corruption and evaluation for multiple languages☆129Updated last month
- ☆29Updated last year
- ☆12Updated 2 years ago
- Clustering with maximum diameter (maximum distance between points inside clusters).☆24Updated 2 years ago
- Pytorch library for end-to-end transformer models training, inference and serving☆70Updated 2 years ago
- Russian dialog datasets parsers and crawlers.☆15Updated 3 years ago
- Distillation of BERT model with catalyst framework☆75Updated last year
- ☆58Updated 7 months ago
- "Rossiya Segodnya" news dataset☆45Updated 4 years ago