scalingpythonml / scaling-python-with-dask
A work-in-progress book on Dask
☆12Updated last year
Alternatives and similar repositories for scaling-python-with-dask:
Users that are interested in scaling-python-with-dask are comparing it to the libraries listed below
- ☆54Updated last year
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 3 years ago
- Scaling Python Machine Learning☆45Updated last year
- IbisML is a library for building scalable ML pipelines using Ibis.☆100Updated last month
- A series of workshop modules introducing Feast feature store.☆19Updated 2 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆44Updated last year
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 6 months ago
- Materials for Apache Arrow workshop at VLDB 2019☆42Updated 4 years ago
- Serverless Python with Ray☆54Updated 2 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Introduction to Ray Core Design Patterns and APIs.☆66Updated last year
- ☆15Updated 5 years ago
- real-time data + ML pipeline☆54Updated 2 weeks ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated last year
- big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.☆65Updated 4 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆26Updated 2 months ago
- RedisAI integration for MLFlow☆30Updated last year
- Fake Pandas / PySpark DataFrame creator☆45Updated 11 months ago
- Fuzzy Data Benchmark☆17Updated last year
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- PySpark phonetic and string matching algorithms☆39Updated 11 months ago
- example how to perform distributed bayesian optimisation (autoML) using optuna on metaflow☆10Updated 3 years ago
- 🚕 Self-contained demo using Redpanda, Materialize, River, Redis, and Streamlit to predict taxi trip durations☆47Updated last year
- Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with …☆47Updated 3 weeks ago
- ForML - A development framework and MLOps platform for the lifecycle management of data science projects☆104Updated last year
- Pandas helper functions☆30Updated last year
- Apache DataFusion Benchmarks☆16Updated 3 months ago
- Python binding for DataFusion☆59Updated 2 years ago