webis-de/small-text

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/webis-de/small-text)

webis-de / small-text

Active Learning for Text Classification in Python

☆646

Alternatives and similar repositories for small-text

Users that are interested in small-text are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NorskRegnesentral / skweak
View on GitHub
skweak: A software toolkit for weak supervision applied to NLP tasks
☆925Sep 2, 2024Updated last year
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,772May 26, 2026Updated last month
argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,039Jul 13, 2026Updated last week
koaning / embetter
View on GitHub
just a bunch of useful embeddings for scikit-learn pipelines
☆527Feb 12, 2026Updated 5 months ago
IBM / low-resource-text-classification-framework
View on GitHub
Research framework for low resource text classification that allows the user to experiment with classification models and active learning…
☆101Mar 9, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
DFKI-NLP / thermostat
View on GitHub
Collection of NLP model explanations and accompanying analysis tools
☆141Jun 26, 2023Updated 3 years ago
modAL-python / modAL
View on GitHub
A modular active learning framework for Python
☆2,357Feb 26, 2024Updated 2 years ago
explosion / floret
View on GitHub
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy
☆343Apr 25, 2025Updated last year
HLasse / TextDescriptives
View on GitHub
A Python library for calculating a large variety of metrics from text
☆366May 5, 2026Updated 2 months ago
autonlab / weasel
View on GitHub
Weakly Supervised End-to-End Learning (NeurIPS 2021)
☆155Mar 20, 2023Updated 3 years ago
hscells / pybool_ir
View on GitHub
Toolkit for domain-specific information retrieval experimentation
☆19May 18, 2026Updated 2 months ago
davidberenstein1957 / concise-concepts
View on GitHub
This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with enti…
☆244Jun 19, 2023Updated 3 years ago
TimSchopf / KeyphraseVectorizers
View on GitHub
Set of vectorizers that extract keyphrases with part-of-speech patterns from a collection of text documents and convert them into a docum…
☆268Nov 8, 2024Updated last year
KennethEnevoldsen / augmenty
View on GitHub
Augmenty is an augmentation library based on spaCy for augmenting texts.
☆156May 24, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
koaning / doubtlab
View on GitHub
Doubt your data, find bad labels.
☆515Jul 15, 2024Updated 2 years ago
MaartenGr / BERTopic
View on GitHub
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
☆7,748May 13, 2026Updated 2 months ago
yueyu1030 / COSINE
View on GitHub
[NAACL 2021] This is the code for our paper `Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self…
☆205Aug 17, 2022Updated 3 years ago
MaartenGr / PolyFuzz
View on GitHub
Fuzzy string matching, grouping, and evaluation.
☆800Jul 10, 2025Updated last year
MilaNLProc / contextualized-topic-models
View on GitHub
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coher…
☆1,272Jul 24, 2025Updated 11 months ago
tomaarsen / SpanMarkerNER
View on GitHub
SpanMarker for Named Entity Recognition
☆477Apr 10, 2026Updated 3 months ago
koaning / bulk
View on GitHub
A Simple Bulk Labelling Tool
☆598Jul 29, 2025Updated 11 months ago
ddangelov / Top2Vec
View on GitHub
Top2Vec learns jointly embedded topic, document and word vectors.
☆3,104Nov 14, 2024Updated last year
g8a9 / ferret
View on GitHub
A python package for benchmarking interpretability techniques on Transformers.
☆215Sep 29, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
makcedward / nlpaug
View on GitHub
Data augmentation for NLP
☆4,662Updated this week
robustness-gym / summvis
View on GitHub
SummVis is an interactive visualization tool for text summarization.
☆253Jun 17, 2022Updated 4 years ago
cdpierse / transformers-interpret
View on GitHub
Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
☆1,416Aug 30, 2023Updated 2 years ago
GEM-benchmark / NL-Augmenter
View on GitHub
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
☆786May 19, 2024Updated 2 years ago
erre-quadro / spikex
View on GitHub
SpikeX - SpaCy Pipes for Knowledge Extraction
☆403Jul 30, 2021Updated 4 years ago
knodle / knodle
View on GitHub
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently…
☆108Sep 10, 2024Updated last year
kabirkhan / recon
View on GitHub
Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …
☆104Feb 26, 2024Updated 2 years ago
microsoft / ASTRA
View on GitHub
Self-training with Weak Supervision (NAACL 2021)
☆162Jul 24, 2023Updated 2 years ago
QData / TextAttack
View on GitHub
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs…
☆3,450Apr 17, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
HuaizhengZhang / Active-Learning-as-a-Service
View on GitHub
A scalable & efficient active learning/data selection system for everyone.
☆219Jul 8, 2024Updated 2 years ago
mourga / contrastive-active-learning
View on GitHub
Code for the EMNLP 2021 Paper "Active Learning by Acquiring Contrastive Examples" & the ACL 2022 Paper "On the Importance of Effectively …
☆129May 24, 2022Updated 4 years ago
ikergarcia1996 / MetaVec
View on GitHub
A monolingual and cross-lingual meta-embedding generation and evaluation framework
☆79Apr 29, 2022Updated 4 years ago
IBM / zshot
View on GitHub
Zero and Few shot named entity & relationships recognition
☆400Sep 17, 2025Updated 10 months ago
davidberenstein1957 / classy-classification
View on GitHub
This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…
☆221Jan 20, 2025Updated last year
DerwenAI / kglab
View on GitHub
Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, …
☆688Jan 25, 2026Updated 5 months ago
asahi417 / tner
View on GitHub
Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An All-Round Python Library for Transformer…
☆397May 11, 2023Updated 3 years ago