spbail / data-quality-toolsLinks
Content for a talk on "The wonderful world of data quality tools in Python"
☆18Updated 4 years ago
Alternatives and similar repositories for data-quality-tools
Users that are interested in data-quality-tools are comparing it to the libraries listed below
Sorting:
- Tutorial for implementing data validation in data science pipelines☆33Updated 3 years ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆51Updated 2 years ago
- Fake Pandas / PySpark DataFrame creator☆48Updated last year
- Check the basic quality of any dataset☆12Updated 4 years ago
- Cost Efficient Data Pipelines with DuckDB☆61Updated 8 months ago
- Data-aware orchestration with dagster, dbt, and airbyte☆31Updated 3 years ago
- Full stack data engineering tools and infrastructure set-up☆57Updated 4 years ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆39Updated 3 years ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Updated 2 months ago
- Sample projects using Ploomber.☆86Updated 2 years ago
- A modern ELT demo using airbyte, dbt, snowflake and dagster☆28Updated 3 years ago
- Record matching and entity resolution at scale in Spark☆36Updated 2 years ago
- ☆11Updated 4 years ago
- New generation opensource data stack☆76Updated 3 years ago
- Build your feature store with macros right within your dbt repository☆39Updated 3 years ago
- Mapping of DWH database tables to business entities, attributes & metrics in Python, with automatic creation of flattened tables☆75Updated 2 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 4 years ago
- Ingesting data with Pulumi, AWS lambdas and Snowflake in a scalable, fully replayable manner☆71Updated 4 years ago
- An automation tool to refactor Jupyter Notebooks to Python modules, with code dependency analysis.☆12Updated 11 months ago
- Demo on how to use Prefect with Docker☆27Updated 3 years ago
- Data engineering with dbt, published by Packt☆89Updated 4 months ago
- PipeRider dbt workshop for DataTalksClub DE Zoomcamp☆18Updated 2 years ago
- A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.☆83Updated last year
- A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects☆225Updated 9 months ago
- Read Delta tables without any Spark☆47Updated last year
- Streamlit EDA Dashboard Powered by AWS Cloud☆84Updated 7 months ago
- Code examples showing flow deployment to various types of infrastructure☆110Updated 3 years ago
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 4 years ago
- ☆31Updated 2 years ago
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆25Updated 3 years ago