code-kern-ai / refinery
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
☆1,392Updated 3 months ago
Related projects: ⓘ
- Open-source natural language enrichments at your fingertips.☆447Updated 5 months ago
- An easy way to extract information from documents☆1,694Updated last year
- An open-source ML pipeline development platform☆969Updated last week
- The simplest way to serve AI/ML models in production☆880Updated this week
- The Virtual Feature Store. Turn your existing data infrastructure into a feature store.☆1,793Updated this week
- 🦘 Explore multimedia datasets at scale☆1,040Updated 4 months ago
- Build data pipelines, the easy way 🛠️☆4,055Updated last year
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆3,797Updated this week
- The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️☆3,479Updated this week
- Efficient few-shot learning with Sentence Transformers☆2,143Updated this week
- A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton☆863Updated last year
- Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.☆677Updated last week
- Blazing fast framework for fine-tuning similarity learning models☆633Updated 2 months ago
- A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagem…☆2,058Updated last month
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆1,970Updated last month
- Scalable identity resolution, entity resolution, data mastering and deduplication using ML☆945Updated this week
- What's in your data? Extract schema, statistics and entities from datasets☆1,418Updated 2 months ago
- Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metada…☆1,723Updated this week
- AI code-writing assistant that understands data content☆2,232Updated 7 months ago
- dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or …☆1,320Updated this week
- ZenML 🙏: The bridge between ML and Ops. https://zenml.io.☆3,936Updated this week
- Distributed DataFrame for Python designed for the cloud, powered by Rust☆2,080Updated this week
- ☆427Updated this week
- Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to wr…☆1,778Updated this week
- Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, v…☆3,789Updated this week
- Natural Intelligence is still a pretty good idea.☆792Updated 2 months ago
- Build and share data reports in 100% Python☆1,370Updated 11 months ago
- skweak: A software toolkit for weak supervision applied to NLP tasks☆918Updated 2 weeks ago
- The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI C…☆1,831Updated 2 weeks ago
- Visualise your Kedro data and machine-learning pipelines and track your experiments.☆666Updated this week