code-kern-ai / refineryLinks
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
☆1,455Updated 9 months ago
Alternatives and similar repositories for refinery
Users that are interested in refinery are comparing it to the libraries listed below
Sorting:
- The simplest way to serve AI/ML models in production☆1,064Updated this week
- Open-source natural language enrichments at your fingertips.☆460Updated 8 months ago
- An easy way to extract information from documents☆1,776Updated 2 years ago
- 🦘 Explore multimedia datasets at scale☆1,064Updated 9 months ago
- An open-source ML pipeline development platform☆996Updated 8 months ago
- The Virtual Feature Store. Turn your existing data infrastructure into a feature store.☆1,942Updated 2 months ago
- Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.☆796Updated last month
- The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️☆3,602Updated 3 months ago
- Blazing fast framework for fine-tuning similarity learning models☆657Updated 5 months ago
- A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton☆860Updated 2 years ago
- skweak: A software toolkit for weak supervision applied to NLP tasks☆926Updated last year
- Labelling platform for text using weak supervision.☆264Updated 3 years ago
- 🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞☆720Updated 2 years ago
- Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc☆391Updated last year
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,109Updated 5 months ago
- Build data pipelines, the easy way 🛠️☆4,142Updated 2 years ago
- Scalable identity resolution, entity resolution, data mastering and deduplication using ML☆1,087Updated last week
- Open Source Data Annotation & Labeling Tools☆639Updated 2 weeks ago
- Build and share data reports in 100% Python☆1,401Updated last year
- Fast model deployment on any cloud 🚀☆176Updated last year
- Neural Search☆333Updated last year
- Efficient few-shot learning with Sentence Transformers☆2,562Updated last month
- 🦙 Integrating LLMs into structured NLP pipelines☆1,309Updated 8 months ago
- Fuzzy string matching, grouping, and evaluation.☆781Updated 2 months ago
- What's in your data? Extract schema, statistics and entities from datasets☆1,516Updated last week
- Build animated charts in Jupyter Notebook and similar environments with a simple Python syntax.☆970Updated 6 months ago
- A Simple Bulk Labelling Tool☆597Updated last month
- Open source no-code system for text annotation and building of text classifiers☆265Updated 3 months ago
- 1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.☆937Updated 7 months ago
- Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and…☆2,259Updated this week