piotr-kalanski / data-quality-monitoringLinks
Data Quality Monitoring Tool
☆15Updated 8 years ago
Alternatives and similar repositories for data-quality-monitoring
Users that are interested in data-quality-monitoring are comparing it to the libraries listed below
Sorting:
- DataQuality for BigData☆145Updated 2 years ago
- Create HTML profiling reports from Apache Spark DataFrames☆197Updated 5 years ago
- Spark package for checking data quality☆222Updated 5 years ago
- Tool to automate data quality checks on data pipelines☆257Updated 3 years ago
- Repository of sample Databricks notebooks☆273Updated last year
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 6 years ago
- Data ingestion library for Amundsen to build graph and search index☆204Updated last year
- PySpark phonetic and string matching algorithms☆39Updated last year
- Machine Learning Pipeline Stages for Spark (exposed in Scala/Java + Python)☆74Updated 2 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Updated 6 years ago
- ☆69Updated 4 years ago
- Spark data source for Salesforce☆81Updated last year
- ☆107Updated 3 years ago
- Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm☆103Updated last year
- Real-world Spark pipelines examples☆83Updated 7 years ago
- Apache Spark (Scala, PySpark, SparkR) Code, Tricks, and References☆69Updated 6 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆586Updated last year
- ☆202Updated 2 years ago
- Building blocks and patterns for building data prep transformations and feature engineering in Spark.☆16Updated 9 years ago
- Apache Spark ETL Utilities☆39Updated last year
- Front-end service library for Amundsen☆279Updated last month
- Python API for Deequ☆41Updated 5 years ago
- A boilerplate for writing PySpark Jobs☆394Updated last year
- This project describes how to write full ETL data pipeline using spark.☆15Updated 3 years ago
- An open-source, vendor-neutral data context service.☆160Updated 7 years ago
- Databricks - Apache Spark™ - 2X Certified Developer☆265Updated 5 years ago
- Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks☆361Updated 8 years ago
- Data Lineage Tracking And Visualization Solution☆650Updated last week
- Repository used for Spark Trainings☆54Updated 2 years ago
- A visual ETL development and debugging tool for big data☆154Updated 3 years ago