Nordstrom / bigdata-profiler
Profiles the data, validates the schema and runs data quality checks and produces a report
☆20Updated 5 years ago
Alternatives and similar repositories for bigdata-profiler:
Users that are interested in bigdata-profiler are comparing it to the libraries listed below
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- Extensible streaming ingestion pipeline on top of Apache Spark☆44Updated last year
- Examples for High Performance Spark☆15Updated 5 months ago
- Spark app to merge different schemas☆23Updated 4 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆107Updated this week
- Unity Catalog UI☆40Updated 7 months ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 4 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated last month
- Rules based grant management for Snowflake☆40Updated 6 years ago
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.☆89Updated this week
- A tool to validate data, built around Apache Spark.☆101Updated 2 weeks ago
- Yet Another (Spark) ETL Framework☆20Updated last year
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Updated 11 months ago
- A simple Spark-powered ETL framework that just works 🍺☆181Updated 2 weeks ago
- A bunch of hacks developed around dbt☆48Updated 5 years ago
- type-class based data cleansing library for Apache Spark SQL☆78Updated 5 years ago
- The Internals of Spark on Kubernetes☆71Updated 2 years ago
- Run dbt serverless in the Cloud (AWS)☆42Updated 5 years ago
- Skeleton project for Apache Airflow training participants to work on.☆16Updated 4 years ago
- DBT Cloud Plugin for Airflow☆38Updated 11 months ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆25Updated 5 years ago
- Flowchart for debugging Spark applications☆105Updated 6 months ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 8 months ago
- Utility functions for dbt projects running on Spark☆32Updated 2 months ago
- Nested array transformation helper extensions for Apache Spark☆37Updated last year
- ☆63Updated 5 years ago
- A curated list of dagster code snippets for data engineers☆54Updated last year
- The go to demo for public and private dbt Learn☆77Updated 3 weeks ago