code-kern-ai / refinery
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
☆1,404Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for refinery
- Open-source natural language enrichments at your fingertips.☆451Updated 7 months ago
- The simplest way to serve AI/ML models in production☆918Updated this week
- 🦘 Explore multimedia datasets at scale☆1,042Updated last month
- An open-source ML pipeline development platform☆974Updated last month
- Blazing fast framework for fine-tuning similarity learning models☆643Updated last month
- 🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞☆717Updated last year
- An easy way to extract information from documents☆1,717Updated last year
- The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️☆3,513Updated 2 months ago
- The Virtual Feature Store. Turn your existing data infrastructure into a feature store.☆1,818Updated this week
- skweak: A software toolkit for weak supervision applied to NLP tasks☆920Updated 2 months ago
- Doubt your data, find bad labels.☆503Updated 4 months ago
- Build data pipelines, the easy way 🛠️☆4,080Updated last year
- A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagem…☆2,088Updated 3 months ago
- The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI C…☆1,849Updated this week
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆3,993Updated this week
- Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.☆698Updated this week
- The mitosheet package, trymito.io, and other public Mito code.☆2,298Updated this week
- A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton☆863Updated last year
- With sequence-learn, you can build models for named entity recognition as quickly as if you were building a sklearn classifier.☆22Updated 2 years ago
- 1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.☆874Updated this week
- Lightwood is Legos for Machine Learning.☆450Updated last week
- The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to s…☆689Updated this week
- Natural Intelligence is still a pretty good idea.☆798Updated 4 months ago
- Efficient few-shot learning with Sentence Transformers☆2,243Updated 2 months ago
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,012Updated 2 months ago
- A Simple Bulk Labelling Tool☆552Updated 3 months ago
- dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem…☆1,570Updated this week
- Open Source Data Annotation & Labeling Tools☆513Updated 3 months ago
- Represent, send, store and search multimodal data☆2,985Updated last month