MDS-BD / hands-on-great-expectations-with-sparkLinks

How to evaluate the Quality of your Data with Great Expectations and Spark.

☆31

Alternatives and similar repositories for hands-on-great-expectations-with-spark

Users that are interested in hands-on-great-expectations-with-spark are comparing it to the libraries listed below

Sorting:

Nike-Inc / spark-expectations
A Python Library to support running data quality rules while the spark job is running⚡
☆193Updated this week
delta-io / delta-examples
Delta Lake examples
☆233Updated last year
mrpowers-io / jodie
Delta lake and filesystem helper methods
☆51Updated last year
jacopotagliabue / paas-data-ingestion
Ingesting data with Pulumi, AWS lambdas and Snowflake in a scalable, fully replayable manner
☆71Updated 3 years ago
alexott / spark-playground
Playing with different packages of the Apache Spark
☆31Updated 2 months ago
rafaelpierre / pyjaws
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
☆44Updated last month
jaceklaskowski / spark-delta-lake-workshop
Spark and Delta Lake Workshop
☆22Updated 3 years ago
MrPowers / mack
Delta Lake helper methods in PySpark
☆324Updated last year
Nike-Inc / brickflow
Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
☆222Updated last week
holdenk / high-performance-spark-examples
Examples for High Performance Spark
☆16Updated last month
StabRise / spark-pdf
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
☆77Updated 7 months ago
bartosz25 / spark-playground
Code snippets used in demos recorded for the blog.
☆37Updated last month
dimajix / flowman
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…
☆97Updated last week
BauplanLabs / no-jvm-wap-with-iceberg
A write-audit-publish implementation on a data lake without the JVM
☆45Updated last year
SemyonSinchenko / flake8-pyspark-with-column
A flake8 plugin that detects of usage withColumn in a loop or inside reduce
☆28Updated 5 months ago
MrPowers / spark-stringmetric
Spark functions to run popular phonetic and string matching algorithms
☆60Updated 3 years ago
mrpowers-io / spark-style-guide
Spark style guide
☆265Updated last year
adidas / lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…
☆275Updated last month
funkyminds / cleanframes
type-class based data cleansing library for Apache Spark SQL
☆78Updated 6 years ago
SETL-Framework / setl
A simple Spark-powered ETL framework that just works 🍺
☆182Updated last month
mikulskibartosz / check-engine
Data validation library for PySpark 3.0.0
☆33Updated 3 years ago
quby-io / databricks-workflow
Example of a scalable IoT data processing pipeline setup using Databricks
☆32Updated 4 years ago
sodadata / soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
☆64Updated 3 years ago
holdenk / spark-upgrade
Magic to help Spark pipelines upgrade
☆34Updated last year
julioasotodv / spark-df-profiling
Create HTML profiling reports from Apache Spark DataFrames
☆197Updated 5 years ago
AbePabbathi / lakehouse-tacklebox
This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.
☆46Updated 10 months ago
souvik-databricks / dlt-with-debug
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT …
☆49Updated 2 years ago
target / data-validator
A tool to validate data, built around Apache Spark.
☆100Updated last week
feast-dev / feast-gcp-fraud-tutorial
Resources backing the Feast fraud tutorial on GCP
☆14Updated 3 years ago
rajagurunath / lakehouse-sharing
A Table format agnostic data sharing framework
☆42Updated last year