FRosner/drunken-data-quality

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FRosner/drunken-data-quality)

FRosner / drunken-data-quality

Spark package for checking data quality

☆220

Alternatives and similar repositories for drunken-data-quality

Users that are interested in drunken-data-quality are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FRosner / spawncamping-dds
View on GitHub
Data-Driven Spark allows quick data exploration based on Apache Spark.
☆29Jan 6, 2017Updated 9 years ago
agile-lab-dev / DataQuality
View on GitHub
DataQuality for BigData
☆149Dec 15, 2023Updated 2 years ago
piotr-kalanski / data-quality-monitoring
View on GitHub
Data Quality Monitoring Tool
☆15Dec 5, 2017Updated 8 years ago
funkyminds / cleanframes
View on GitHub
type-class based data cleansing library for Apache Spark SQL
☆79Jun 23, 2019Updated 7 years ago
databrickslabs / dataframe-rules-engine
View on GitHub
Extensible Rules Engine for custom Dataframe / Dataset validation
☆141May 7, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
databricks / drunken-data-quality-1
View on GitHub
Spark package for checking data quality
☆26Mar 30, 2023Updated 3 years ago
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,637Updated this week
tmalaska / Spark.TableStatsExample
View on GitHub
Simple Spark example of generating table stats for use of data quality checks
☆27Apr 28, 2017Updated 9 years ago
datacleaner / DataCleaner
View on GitHub
The premier open source Data Quality solution
☆651Jun 30, 2026Updated 3 weeks ago
holdenk / spark-testing-base
View on GitHub
Base classes to use when writing tests with Spark
☆1,553Apr 20, 2026Updated 3 months ago
springnz / sparkplug
View on GitHub
A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.
☆47Aug 1, 2016Updated 9 years ago
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated last month
jpzk / twitterstream
View on GitHub
Twitter Streaming API Example with Kafka Streams in Scala
☆49Aug 22, 2016Updated 9 years ago
eBay / griffin
View on GitHub
Model driven data quality service
☆239Dec 4, 2017Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AbsaOSS / spline
View on GitHub
Data Lineage Tracking And Visualization Solution
☆663Updated this week
sosuneko / pydqc
View on GitHub
python automatic data quality check toolkit
☆277Sep 15, 2020Updated 5 years ago
datamindedbe / lighthouse
View on GitHub
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…
☆64Sep 6, 2024Updated last year
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
apache / griffin
View on GitHub
Mirror of Apache griffin
☆1,173Aug 3, 2025Updated 11 months ago
edmunds / databricks-rest-client
View on GitHub
☆28Jan 9, 2026Updated 6 months ago
brkyvz / lazy-linalg
View on GitHub
A package full of linear algebra operators for Apache Spark MLlib's linalg package
☆10Sep 9, 2015Updated 10 years ago
bikash / DataQuality
View on GitHub
Tutorial and examples of Data Quality in Big Data System
☆11Apr 25, 2017Updated 9 years ago
Neutrinic / flare
View on GitHub
Full-stack OpenTelemetry observability for Apache Spark
☆16Feb 28, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
lucidworks / data-quality
View on GitHub
Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities
☆26Jan 27, 2025Updated last year
cloudera-labs / envelope
View on GitHub
Build configuration-driven ETL pipelines on Apache Spark
☆162Oct 4, 2022Updated 3 years ago
AndreSchumacher / avro-parquet-spark-example
View on GitHub
An example of using Avro and Parquet in Spark SQL
☆60Nov 16, 2015Updated 10 years ago
pranab / chombo
View on GitHub
Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm
☆106Jan 22, 2024Updated 2 years ago
ShaneDelmore / imclipitly
View on GitHub
Make your project more clippity implicitly with imclipitly
☆17Apr 29, 2017Updated 9 years ago
mrpowers-io / spark-fast-tests
View on GitHub
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
☆457Apr 2, 2026Updated 3 months ago
typelevel / frameless
View on GitHub
Expressive types for Spark.
☆898Jul 18, 2026Updated last week
etsy / boundary-layer
View on GitHub
Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform
☆260Jul 19, 2023Updated 3 years ago
nerdammer / spark-additions
View on GitHub
Utilities for Apache Spark
☆34Mar 5, 2016Updated 10 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nevillelyh / parquet-extra
View on GitHub
A collection of Apache Parquet add-on modules
☆31Updated this week
sourav-mazumder / Data-Science-Extensions
View on GitHub
☆70Mar 15, 2021Updated 5 years ago
ExpediaGroup / shunting-yard
View on GitHub
Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.
☆20Oct 11, 2021Updated 4 years ago
ambition119 / QueryParse
View on GitHub
sql解析和执行，能够执行hive, spark, flink, 以及对应对TensorFlow, Deeplearning4j的算法SQL执行
☆11Sep 16, 2022Updated 3 years ago
apache / incubator-retired-amaterasu
View on GitHub
Apache Amaterasu
☆56Oct 18, 2019Updated 6 years ago
51zero / eel-sdk
View on GitHub
Big Data Toolkit for the JVM
☆147Nov 4, 2020Updated 5 years ago
yaooqinn / spark-authorizer
View on GitHub
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…
☆183Apr 6, 2022Updated 4 years ago