scalingpythonml / scaling-python-with-dask
A work-in-progress book on Dask
☆12Updated last year
Related projects: ⓘ
- Record matching and entity resolution at scale in Spark☆31Updated 10 months ago
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 3 years ago
- A series of workshop modules introducing Feast feature store.☆19Updated 2 years ago
- ☆54Updated 8 months ago
- An abstraction layer for parameter tuning☆36Updated 2 weeks ago
- Introduction to Ray Core Design Patterns and APIs.☆56Updated 8 months ago
- big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.☆65Updated 4 years ago
- Scaling Python Machine Learning☆44Updated last year
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated last year
- example how to perform distributed bayesian optimisation (autoML) using optuna on metaflow☆10Updated 2 years ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆40Updated last year
- Accelerator to rapidly deploy customized features for your business☆55Updated 9 months ago
- Instant search for and access to many datasets in Pyspark.☆34Updated last year
- IbisML is a library for building scalable ML pipelines using Ibis.☆81Updated this week
- Serverless Python with Ray☆52Updated last year
- Fake Pandas / PySpark DataFrame creator☆35Updated 6 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆111Updated 5 months ago
- PySpark phonetic and string matching algorithms☆35Updated 7 months ago
- real-time data + ML pipeline☆54Updated this week
- Repository for my master thesis on automated string handling☆16Updated 3 years ago
- Materials for Apache Arrow workshop at VLDB 2019☆42Updated 4 years ago
- ☆15Updated 5 years ago
- Feature Engine for real-time AI/ML☆36Updated this week
- Inspect ML Pipelines in Python in the form of a DAG☆68Updated 6 months ago
- Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptio…☆35Updated last year
- Pandas helper functions☆29Updated last year
- 🚕 Self-contained demo using Redpanda, Materialize, River, Redis, and Streamlit to predict taxi trip durations☆44Updated last year
- ForML - A development framework and MLOps platform for the lifecycle management of data science projects☆105Updated last year
- 🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects☆81Updated 2 years ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago