scalingpythonml / scaling-python-with-dask
A work-in-progress book on Dask
☆12Updated last year
Alternatives and similar repositories for scaling-python-with-dask
Users that are interested in scaling-python-with-dask are comparing it to the libraries listed below
Sorting:
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated 2 years ago
- Fake Pandas / PySpark DataFrame creator☆46Updated last year
- Introduction to Ray Core Design Patterns and APIs.☆68Updated last year
- ☆58Updated last year
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 4 years ago
- Materials for Apache Arrow workshop at VLDB 2019☆42Updated 4 years ago
- real-time data + ML pipeline☆54Updated this week
- Serverless Python with Ray☆55Updated 2 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆46Updated last year
- Scaling Python Machine Learning☆46Updated last year
- big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.☆65Updated 4 years ago
- example how to perform distributed bayesian optimisation (autoML) using optuna on metaflow☆10Updated 3 years ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 9 months ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- ☆22Updated 2 months ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- ☆17Updated 2 years ago
- Fuzzy Data Benchmark☆17Updated last year
- A series of workshop modules introducing Feast feature store.☆19Updated 2 years ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- A library that brings useful functions from various modern database management systems to Apache Spark☆59Updated last year
- IbisML is a library for building scalable ML pipelines using Ibis.☆108Updated 4 months ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated 2 weeks ago
- 🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects☆81Updated 3 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated 5 months ago
- A Python-to-SQL transpiler as replacement for Python Pandas☆48Updated 2 years ago
- Ibis analytics, with Ibis (and more!)☆21Updated 7 months ago
- Python library to run ML/data pipelines on stateless compute infrastructure (that may be ephemeral or serverless). Please see the documen…☆18Updated last year
- XGBoost GPU accelerated on Spark example applications☆52Updated 2 years ago