fugue-project / tutorialsLinks
Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask without any rewrites.
☆113Updated last year
Alternatives and similar repositories for tutorials
Users that are interested in tutorials are comparing it to the libraries listed below
Sorting:
- Fake Pandas / PySpark DataFrame creator☆47Updated last year
- Possibly the fastest DataFrame-agnostic quality check library in town.☆190Updated last week
- ✨ A Pydantic to PySpark schema library☆91Updated this week
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆54Updated 8 months ago
- IbisML is a library for building scalable ML pipelines using Ibis.☆109Updated 5 months ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆50Updated last year
- Pandas helper functions☆31Updated 2 years ago
- Make simple storing test results and visualisation of these in a BI dashboard☆44Updated 2 months ago
- Delta Lake helper methods. No Spark dependency.☆23Updated 8 months ago
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆216Updated 3 weeks ago
- A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.☆80Updated last year
- Read Delta tables without any Spark☆47Updated last year
- An abstraction layer for parameter tuning☆35Updated 9 months ago
- Data-aware orchestration with dagster, dbt, and airbyte☆31Updated 2 years ago
- A Python Library to support running data quality rules while the spark job is running⚡☆188Updated this week
- A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.☆23Updated last year
- Demo of Streamlit application with Databricks SQL Endpoint☆35Updated 2 years ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆61Updated 2 years ago
- Templates for your Kedro projects.☆76Updated last week
- Code examples showing flow deployment to various types of infrastructure☆106Updated 2 years ago
- A FastMCP tool to search and retrieve Polars API documentation.☆60Updated last week
- PySpark schema generator☆42Updated 2 years ago
- Plugins, extensions, case studies, articles, and video tutorials for Kedro☆76Updated 5 months ago
- First-party plugins maintained by the Kedro team.☆103Updated this week
- A curated list of dagster code snippets for data engineers☆55Updated last year
- Cost Efficient Data Pipelines with DuckDB☆53Updated 2 weeks ago
- Repo for orienting dbt users to the Dagster asset framework☆54Updated 2 years ago
- The easiest way to integrate Kedro and Great Expectations☆52Updated 2 years ago
- New generation opensource data stack☆68Updated 3 years ago
- Delta Lake Documentation☆49Updated 11 months ago