spbail / data-quality-toolsLinks
Content for a talk on "The wonderful world of data quality tools in Python"
☆18Updated 4 years ago
Alternatives and similar repositories for data-quality-tools
Users that are interested in data-quality-tools are comparing it to the libraries listed below
Sorting:
- Fake Pandas / PySpark DataFrame creator☆48Updated last year
- Full stack data engineering tools and infrastructure set-up☆57Updated 4 years ago
- New generation opensource data stack☆76Updated 3 years ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆50Updated 2 years ago
- Sample projects using Ploomber.☆86Updated last year
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Updated last month
- A modern ELT demo using airbyte, dbt, snowflake and dagster☆28Updated 3 years ago
- Weekly Data Engineering Newsletter☆97Updated last year
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 4 years ago
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆58Updated 3 years ago
- Cost Efficient Data Pipelines with DuckDB☆60Updated 7 months ago
- ☆10Updated 3 years ago
- Data-aware orchestration with dagster, dbt, and airbyte☆31Updated 2 years ago
- How to use Python to understand data and transform the data into a tidy format ready to be used for modelling and visualisation.☆36Updated 6 years ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆39Updated 3 years ago
- Record matching and entity resolution at scale in Spark☆36Updated 2 years ago
- Build your feature store with macros right within your dbt repository☆39Updated 3 years ago
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆40Updated last year
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆23Updated 3 years ago
- An automation tool to refactor Jupyter Notebooks to Python modules, with code dependency analysis.☆12Updated 9 months ago
- ☆35Updated last week
- Demo on how to use Prefect with Docker☆27Updated 3 years ago
- A portable Datamart and Business Intelligence suite built with Docker, sqlmesh + dbtcore, DuckDB and Superset☆55Updated 2 months ago
- ☆48Updated last year
- ☆93Updated 2 years ago
- ☆23Updated last year
- Pandas helper functions☆31Updated 2 years ago
- ☆72Updated last week
- Ingesting data with Pulumi, AWS lambdas and Snowflake in a scalable, fully replayable manner☆71Updated 3 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago