spbail / data-quality-tools
Content for a talk on "The wonderful world of data quality tools in Python"
☆19Updated 3 years ago
Alternatives and similar repositories for data-quality-tools:
Users that are interested in data-quality-tools are comparing it to the libraries listed below
- Full stack data engineering tools and infrastructure set-up☆48Updated 4 years ago
- A modern ELT demo using airbyte, dbt, snowflake and dagster☆26Updated 2 years ago
- Check the basic quality of any dataset☆11Updated 3 years ago
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆36Updated 6 months ago
- Fake Pandas / PySpark DataFrame creator☆45Updated 11 months ago
- Code for data quality with greatexpectations blog☆12Updated 6 months ago
- Code for my "Efficient Data Processing in SQL" book.☆55Updated 6 months ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆49Updated last year
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.☆10Updated last year
- Cost Efficient Data Pipelines with DuckDB☆49Updated 6 months ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆11Updated 8 months ago
- ☆17Updated 6 months ago
- A Python package to help Databricks Unity Catalog users to read and query Delta Lake tables with Polars, DuckDb, or PyArrow.☆23Updated 10 months ago
- ☆15Updated 9 months ago
- Open Data Stack Projects: Examples of End to End Data Engineering Projects☆75Updated last year
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆56Updated 2 years ago
- Evaluation Matrix for Change Data Capture☆25Updated 6 months ago
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆21Updated 2 years ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 6 months ago
- ☆33Updated 8 months ago
- Sample projects using Ploomber.☆86Updated last year
- csv and flat-file sniffer built in Rust.☆42Updated last year
- dagster scikit-learn pipeline example.☆44Updated last year
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 3 years ago
- Data-aware orchestration with dagster, dbt, and airbyte☆31Updated 2 years ago
- DuckDB with Dashboarding tools demo evidence, streamlit and rill☆15Updated last year
- Examples of various flow deployments for Prefect 1.0 (storage and run configurations)☆35Updated 2 years ago
- A very simple "hello world" project for deploying Prefect 2 to a docker container on Google Compute Engine.☆11Updated 2 years ago