agile-lab-dev/DataQuality

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/agile-lab-dev/DataQuality)

agile-lab-dev / DataQuality

DataQuality for BigData

☆149

Alternatives and similar repositories for DataQuality

Users that are interested in DataQuality are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FRosner / drunken-data-quality
View on GitHub
Spark package for checking data quality
☆220Feb 28, 2020Updated 6 years ago
piotr-kalanski / data-quality-monitoring
View on GitHub
Data Quality Monitoring Tool
☆15Dec 5, 2017Updated 8 years ago
agile-lab-dev / darwin
View on GitHub
Avro Schema Evolution made easy
☆36Feb 8, 2024Updated 2 years ago
ubisoft / mobydq
View on GitHub
Tool to automate data quality checks on data pipelines
☆257Sep 10, 2022Updated 3 years ago
datacleaner / DataCleaner
View on GitHub
The premier open source Data Quality solution
☆651Jun 30, 2026Updated 3 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
bikash / DataQuality
View on GitHub
Tutorial and examples of Data Quality in Big Data System
☆11Apr 25, 2017Updated 9 years ago
isarn / isarn-sketches
View on GitHub
Sketching data structures for scala, including t-digest
☆15Sep 7, 2021Updated 4 years ago
sosuneko / pydqc
View on GitHub
python automatic data quality check toolkit
☆277Sep 15, 2020Updated 5 years ago
sev7e0 / wow-spark
View on GitHub
spark自学手册，包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake，以及scala基础练习，还有一些例如master、shuﬄe源码分析，总结及翻译。
☆18Jul 19, 2023Updated 3 years ago
PasaLab / SmartFD
View on GitHub
SmartFD: Efficient and Scalable Functional Dependency Discovery on Distributed Data-Parallel Platforms
☆19Aug 23, 2018Updated 7 years ago
agile-lab-dev / witboost-starter-kit
View on GitHub
Witboost is a versatile platform that addresses a wide range of sophisticated data engineering challenges. The Starter Kit showcases the …
☆27May 22, 2026Updated last month
lisehr / dq-meerkat
View on GitHub
Automated Continuous Data Quality Measurement
☆12Nov 15, 2023Updated 2 years ago
lucidworks / data-quality
View on GitHub
Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities
☆26Jan 27, 2025Updated last year
timgent / data-flare
View on GitHub
Data quality control tool built on spark and deequ
☆25May 9, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
monolive / ambari-custom-alerts
View on GitHub
Custom Alerts for Ambari server
☆12Jul 27, 2015Updated 10 years ago
apache / griffin
View on GitHub
Mirror of Apache griffin
☆1,169Aug 3, 2025Updated 11 months ago
GoogleCloudPlatform / dataproc-pubsub-spark-streaming
View on GitHub
☆31Oct 17, 2018Updated 7 years ago
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,635Updated this week
AbsaOSS / spline
View on GitHub
Data Lineage Tracking And Visualization Solution
☆662Jul 13, 2026Updated last week
aravinthsci / Spark_Delta_Lake
View on GitHub
Delta Lake Examples
☆11Apr 24, 2020Updated 6 years ago
swoop-inc / spark-records
View on GitHub
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Mar 14, 2021Updated 5 years ago
hammerlab / spark-util
View on GitHub
low-level helpers for Apache Spark libraries and tests
☆16Dec 29, 2018Updated 7 years ago
Teradata / kylo
View on GitHub
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies…
☆1,111Jan 12, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
CeON / spark-utils
View on GitHub
Practical utilities for spark applications
☆11Feb 26, 2026Updated 4 months ago
spbail / data-quality-tools
View on GitHub
Content for a talk on "The wonderful world of data quality tools in Python"
☆18May 5, 2021Updated 5 years ago
WeBankFinTech / Qualitis
View on GitHub
Qualitis is a one-stop data quality management platform that supports quality verification, notification, and management for various data…
☆764Apr 2, 2026Updated 3 months ago
microsoft / Data-Quality-Rule-Engine
View on GitHub
☆24Apr 21, 2023Updated 3 years ago
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated 3 weeks ago
smart-data-lake / smart-data-lake
View on GitHub
Smart Automation Tool for building modern Data Lakes and Data Pipelines
☆129Updated this week
ZuInnoTe / spark-hadoopoffice-ds
View on GitHub
A Spark datasource for the HadoopOffice library
☆36Sep 29, 2025Updated 9 months ago
bytedance / clickhouse_hadoop
View on GitHub
Import data from clickhouse to hadoop with pure SQL
☆34Mar 19, 2019Updated 7 years ago
mrpowers-io / spark-fast-tests
View on GitHub
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
☆458Apr 2, 2026Updated 3 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
sutugin / spark-streaming-jdbc-source
View on GitHub
☆26Apr 15, 2021Updated 5 years ago
Impetus / jumbune
View on GitHub
Jumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http:…
☆73Jan 1, 2023Updated 3 years ago
hammerlab / magic-rdds
View on GitHub
Miscellaneous functionality for manipulating Apache Spark RDDs.
☆22Dec 29, 2018Updated 7 years ago
NYUBigDataProject / SparkClean
View on GitHub
A Scalable Data Cleaning Library for PySpark.
☆29Apr 4, 2019Updated 7 years ago
hurtn / databricks
View on GitHub
☆12Aug 6, 2020Updated 5 years ago
tupol / spark-utils
View on GitHub
Basic framework utilities to quickly start writing production ready Apache Spark applications
☆36Dec 15, 2024Updated last year
bomeng / Heracles
View on GitHub
High performance HBase / Spark SQL engine
☆28Jul 7, 2022Updated 4 years ago