scalingpythonml / scaling-python-with-dask
A work-in-progress book on Dask
☆12Updated last year
Alternatives and similar repositories for scaling-python-with-dask:
Users that are interested in scaling-python-with-dask are comparing it to the libraries listed below
- Example for simple Apache Arrow Flight service with Apache Spark and TensorFlow clients☆36Updated 4 years ago
- Scaling Python Machine Learning☆45Updated last year
- ☆55Updated last year
- A proof-of-concept repo that attempts to use Apache Superset with a custom ADBC to Arrow Flight SQL SQLAlchemy driver.☆23Updated last year
- IbisML is a library for building scalable ML pipelines using Ibis.☆104Updated 2 months ago
- Materials for Apache Arrow workshop at VLDB 2019☆42Updated 4 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- Delta Lake helper methods. No Spark dependency.☆23Updated 6 months ago
- Ibis Substrait Compiler☆99Updated this week
- Python binding for DataFusion☆59Updated 2 years ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated 11 months ago
- Python library to run ML/data pipelines on stateless compute infrastructure (that may be ephemeral or serverless). Please see the documen…☆18Updated last year
- Ray-based Apache Beam runner☆43Updated last year
- Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb☆20Updated last year
- Serverless Python with Ray☆55Updated 2 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- A series of workshop modules introducing Feast feature store.☆19Updated 2 years ago
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shou…☆10Updated last year
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated last year
- Arrow, pydantic style☆82Updated 2 years ago
- Introduction to Ray Core Design Patterns and APIs.☆67Updated last year
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 7 months ago
- ☆37Updated this week
- In-Memory Analytics with Apache Arrow, published by Packt☆96Updated last year
- Sample code to accompany blog post showcasing Arrow Flight SQL running on DuckDB☆31Updated 2 years ago
- ☆27Updated last year
- big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.☆65Updated 4 years ago
- Unified Distributed Execution☆51Updated 4 months ago
- Train Gradient Boosting and Random Forest with only SQL (VLDB 2023)☆22Updated last year