spbail / data-quality-toolsLinks
Content for a talk on "The wonderful world of data quality tools in Python"
☆18Updated 4 years ago
Alternatives and similar repositories for data-quality-tools
Users that are interested in data-quality-tools are comparing it to the libraries listed below
Sorting:
- Fake Pandas / PySpark DataFrame creator☆48Updated last year
- How to use Python to understand data and transform the data into a tidy format ready to be used for modelling and visualisation.☆36Updated 6 years ago
- Full stack data engineering tools and infrastructure set-up☆57Updated 4 years ago
- A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.☆81Updated last year
- Supporting materials/code examples for my course in data engineering for machine learning.☆39Updated 3 years ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Updated last week
- Sample projects using Ploomber.☆86Updated last year
- Cost Efficient Data Pipelines with DuckDB☆60Updated 6 months ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆50Updated 2 years ago
- A modern ELT demo using airbyte, dbt, snowflake and dagster☆28Updated 2 years ago
- Data-aware orchestration with dagster, dbt, and airbyte☆30Updated 2 years ago
- Read Delta tables without any Spark☆47Updated last year
- ☆23Updated last year
- Possibly the fastest DataFrame-agnostic quality check library in town.☆225Updated 3 weeks ago
- Ingesting data with Pulumi, AWS lambdas and Snowflake in a scalable, fully replayable manner☆71Updated 3 years ago
- Demo on how to use Prefect with Docker☆27Updated 3 years ago
- ☆10Updated 3 years ago
- A guide for leading a data (engineering) team☆63Updated last year
- Pandas helper functions☆31Updated 2 years ago
- Check the basic quality of any dataset☆11Updated 4 years ago
- New generation opensource data stack☆75Updated 3 years ago
- Tutorial for implementing data validation in data science pipelines☆33Updated 3 years ago
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Updated 5 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data qualit…☆65Updated last month
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 4 years ago
- An automation tool to refactor Jupyter Notebooks to Python modules, with code dependency analysis.☆12Updated 9 months ago
- Code examples showing flow deployment to various types of infrastructure☆111Updated 2 years ago
- Code examples for the Introduction to Kubeflow course☆14Updated 4 years ago
- A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.☆26Updated last year