Nordstrom / bigdata-profiler
Profiles the data, validates the schema and runs data quality checks and produces a report
☆20Updated 5 years ago
Alternatives and similar repositories for bigdata-profiler:
Users that are interested in bigdata-profiler are comparing it to the libraries listed below
- Examples for High Performance Spark☆15Updated 6 months ago
- Skeleton project for Apache Airflow training participants to work on.☆16Updated 4 years ago
- Airflow workflow management platform chef cookbook.☆71Updated 5 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆46Updated last year
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 8 months ago
- Rules based grant management for Snowflake☆40Updated 6 years ago
- A toolset to streamline running spark python on EMR☆20Updated 8 years ago
- Yet Another (Spark) ETL Framework☆21Updated last year
- The sane way of building a data layer in Airflow☆24Updated 5 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆108Updated this week
- Magic to help Spark pipelines upgrade☆34Updated 7 months ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Utility functions for dbt projects running on Spark☆33Updated 2 months ago
- Big Data Demystified meetup and blog examples☆31Updated 8 months ago
- Snowflake Grant Report offers a way of visualizing role hierarchy and rapid diagnosis of as-is permissions, giving customers insight with…☆75Updated 2 years ago
- ☆16Updated 4 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 2 years ago
- Unity Catalog UI☆40Updated 8 months ago
- Benchmark data warehouses under Fivetran-like conditions☆166Updated 2 years ago
- Materials for various Hadoop & Nifi related workshops☆51Updated 6 years ago
- Run dbt serverless in the Cloud (AWS)☆42Updated 5 years ago
- A bunch of hacks developed around dbt☆48Updated 5 years ago
- Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and…☆28Updated 2 years ago
- [ARCHIVED] The Presto adapter plugin for dbt Core☆33Updated last year
- A Table format agnostic data sharing framework☆38Updated last year
- Snowflake Connector for Dremio using the ARP SDK.☆16Updated 2 years ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆61Updated 2 years ago
- ☆20Updated 4 years ago