code-kern-ai/refinery

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/code-kern-ai/refinery)

code-kern-ai / refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

☆1,470

Alternatives and similar repositories for refinery

Users that are interested in refinery are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

code-kern-ai / bricks
View on GitHub
Open-source natural language enrichments at your fingertips.
☆461Jan 14, 2025Updated last year
code-kern-ai / sequence-learn
View on GitHub
With sequence-learn, you can build models for named entity recognition as quickly as if you were building a sklearn classifier.
☆22Oct 20, 2022Updated 3 years ago
code-kern-ai / refinery-python-sdk
View on GitHub
Official Python SDK for Kern AI refinery.
☆20Nov 14, 2024Updated last year
argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,048Updated this week
code-kern-ai / embedders
View on GitHub
With embedders, you can easily convert your texts into sentence- or token-level embeddings within a few lines of code. Use cases for this…
☆21Jul 14, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
code-kern-ai / refinery-sample-projects
View on GitHub
Containing examples of projects you can use to test refinery. Please select the use case from the branches.
☆25Aug 7, 2023Updated 2 years ago
NorskRegnesentral / skweak
View on GitHub
skweak: A software toolkit for weak supervision applied to NLP tasks
☆925Sep 2, 2024Updated last year
code-kern-ai / automl-docker
View on GitHub
CLI-based tool to automatically build ML models from training data into a servable Docker container
☆60Aug 15, 2022Updated 3 years ago
webis-de / small-text
View on GitHub
Active Learning for Text Classification in Python
☆646May 24, 2026Updated 2 months ago
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,777May 26, 2026Updated last month
cleanlab / cleanlab
View on GitHub
Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data …
☆11,595Jan 13, 2026Updated 6 months ago
qdrant / quaterion
View on GitHub
Blazing fast framework for fine-tuning similarity learning models
☆661Jul 6, 2026Updated 2 weeks ago
neuml / txtai
View on GitHub
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
☆12,750Updated this week
davidberenstein1957 / concise-concepts
View on GitHub
This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with enti…
☆244Jun 19, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
deepset-ai / haystack
View on GitHub
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…
☆25,997Updated this week
impira / docquery
View on GitHub
An easy way to extract information from documents
☆1,774May 3, 2023Updated 3 years ago
orchest / orchest
View on GitHub
Build data pipelines, the easy way 🛠️
☆4,135Jun 6, 2023Updated 3 years ago
koaning / doubtlab
View on GitHub
Doubt your data, find bad labels.
☆515Jul 15, 2024Updated 2 years ago
HumanSignal / label-studio
View on GitHub
Label Studio is a multi-type data labeling and annotation tool with standardized output format
☆27,909Updated this week
HLasse / TextDescriptives
View on GitHub
A Python library for calculating a large variety of metrics from text
☆366May 5, 2026Updated 2 months ago
deepchecks / deepchecks
View on GitHub
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML va…
☆4,039Dec 28, 2025Updated 6 months ago
koaning / embetter
View on GitHub
just a bunch of useful embeddings for scikit-learn pipelines
☆527Feb 12, 2026Updated 5 months ago
koaning / bulk
View on GitHub
A Simple Bulk Labelling Tool
☆599Jul 29, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
featureform / featureform
View on GitHub
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
☆1,981Jul 3, 2025Updated last year
kedro-org / kedro
View on GitHub
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…
☆10,931Updated this week
zenml-io / zenml
View on GitHub
ZenML 🙏: One AI Platform from Pipelines to Agents. https://zenml.io.
☆5,516Updated this week
docarray / docarray
View on GitHub
Represent, send, store and search multimodal data
☆3,123Mar 27, 2026Updated 3 months ago
NannyML / nannyml
View on GitHub
nannyml: post-deployment data science in python
☆2,146Jul 12, 2025Updated last year
jina-ai / serve
View on GitHub
☁️ Build multimodal AI applications with cloud-native stack
☆21,862Mar 24, 2025Updated last year
snorkel-team / snorkel
View on GitHub
A system for quickly generating training data with weak supervision
☆5,994Jun 8, 2026Updated last month
MaartenGr / BERTopic
View on GitHub
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
☆7,754May 13, 2026Updated 2 months ago
knodle / knodle
View on GitHub
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently…
☆108Sep 10, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
axa-group / Parsr
View on GitHub
Transforms PDF, Documents and Images into Enriched Structured Data
☆6,177Mar 20, 2026Updated 4 months ago
dataqa / nlp-labelling
View on GitHub
Labelling platform for text using weak supervision.
☆260Jun 24, 2022Updated 4 years ago
zenml-io / awesome-open-data-annotation
View on GitHub
Open Source Data Annotation & Labeling Tools
☆718Jul 6, 2026Updated 2 weeks ago
fugue-project / fugue
View on GitHub
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…
☆2,170May 19, 2026Updated 2 months ago
ploomber / ploomber
View on GitHub
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
☆3,622May 29, 2025Updated last year
aimhubio / aim
View on GitHub
Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
☆6,204Updated this week
IBM / zshot
View on GitHub
Zero and Few shot named entity & relationships recognition
☆400Sep 17, 2025Updated 10 months ago