code-kern-ai / refinery
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
β1,412Updated 2 months ago
Alternatives and similar repositories for refinery:
Users that are interested in refinery are comparing it to the libraries listed below
- Open-source natural language enrichments at your fingertips.β455Updated last month
- An open-source ML pipeline development platformβ979Updated last month
- π¦ Explore multimedia datasets at scaleβ1,053Updated 2 months ago
- A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamiltonβ861Updated last year
- The Virtual Feature Store. Turn your existing data infrastructure into a feature store.β1,845Updated this week
- πΆ A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one dayπ€β718Updated last year
- The simplest way to serve AI/ML models in productionβ947Updated this week
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewβ¦β2,040Updated 4 months ago
- Natural Intelligence is still a pretty good idea.β801Updated 7 months ago
- The fastest β‘οΈ way to build data pipelines. Develop iteratively, deploy anywhere. βοΈβ3,543Updated 5 months ago
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasetsβ4,319Updated this week
- An easy way to extract information from documentsβ1,737Updated last year
- Zoomable, animated scatterplots in the browser that scales over a billion pointsβ1,088Updated 3 weeks ago
- Distributed data engine for Python/SQL designed for the cloud, powered by Rustβ2,541Updated this week
- Doubt your data, find bad labels.β508Updated 7 months ago
- skweak: A software toolkit for weak supervision applied to NLP tasksβ923Updated 5 months ago
- Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadaβ¦β2,023Updated last week
- With sequence-learn, you can build models for named entity recognition as quickly as if you were building a sklearn classifier.β22Updated 2 years ago
- nannyml: post-deployment data science in pythonβ2,019Updated last month
- Represent, send, store and search multimodal dataβ3,013Updated 2 months ago
- UnionML: the easiest way to build and deploy machine learning microservicesβ335Updated last year
- Software that makes labeling PDFs easy.β405Updated 9 months ago
- A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagemβ¦β2,105Updated 3 weeks ago
- dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-premβ¦β1,670Updated this week
- What's in your data? Extract schema, statistics and entities from datasetsβ1,458Updated last week
- Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications,β¦β285Updated this week
- Malloy is an experimental language for describing data relationships and transformations.β2,056Updated this week
- Build data pipelines, the easy way π οΈβ4,110Updated last year
- Jupyter extensions that help you write code faster: Context aware AI Chat, Autocomplete, and Spreadsheetβ2,348Updated this week
- just a bunch of useful embeddings for scikit-learn pipelinesβ480Updated 3 weeks ago