target/data-validator

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/target/data-validator)

target / data-validator

A tool to validate data, built around Apache Spark.

☆102

Alternatives and similar repositories for data-validator

Users that are interested in data-validator are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

swoop-inc / spark-records
View on GitHub
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Mar 14, 2021Updated 5 years ago
funkyminds / cleanframes
View on GitHub
type-class based data cleansing library for Apache Spark SQL
☆79Jun 23, 2019Updated 7 years ago
hammerlab / spark-util
View on GitHub
low-level helpers for Apache Spark libraries and tests
☆16Dec 29, 2018Updated 7 years ago
mikulskibartosz / check-engine
View on GitHub
Data validation library for PySpark 3.0.0
☆33Nov 11, 2022Updated 3 years ago
davegurnell / validation
View on GitHub
Scala data validation library
☆30Aug 14, 2016Updated 9 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ronald-smith-angel / owl-data-sanitizer
View on GitHub
A pyspark lib to validate data quality
☆19Nov 11, 2022Updated 3 years ago
scravy / pysparkextra
View on GitHub
☆10Jun 29, 2021Updated 5 years ago
praetorian-inc / gcloud-lockdown
View on GitHub
Scripts to demonstrate VPC Service Controls between tenant and shared projects
☆12Jun 11, 2019Updated 7 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated 3 weeks ago
cerndb / SparkPlugins
View on GitHub
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆96May 11, 2026Updated 2 months ago
univalence / centrifuge
View on GitHub
Data quality tools for Big Data
☆19Oct 10, 2019Updated 6 years ago
lightcopy / parquet-index
View on GitHub
Spark SQL index for Parquet tables
☆134May 6, 2021Updated 5 years ago
aravinthsci / Spark_Delta_Lake
View on GitHub
Delta Lake Examples
☆11Apr 24, 2020Updated 6 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
tharwaninitin / etlflow
View on GitHub
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Goo…
☆45Aug 26, 2024Updated last year
SaurabhChawla100 / spark-radiant
View on GitHub
Spark-Radiant is Apache Spark Performance and Cost Optimizer
☆25Dec 31, 2024Updated last year
ebonnal / delta-lake-ui
View on GitHub
[student project] UI to run SQL on Delta Lake tables and visualize the variations of the result among tables versions
☆12Apr 21, 2020Updated 6 years ago
aws-samples / aws-emr-apache-ranger
View on GitHub
☆24Oct 3, 2023Updated 2 years ago
MrPowers / bebe
View on GitHub
Filling in the Spark function gaps across APIs
☆50Apr 14, 2021Updated 5 years ago
univalence / spark-tools
View on GitHub
☆46Apr 27, 2020Updated 6 years ago
mjakubowski84 / parquet4s
View on GitHub
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
☆302Jul 13, 2025Updated last year
CODAIT / aardpfark
View on GitHub
A library for exporting Spark ML models and pipelines to PFA
☆55Nov 21, 2018Updated 7 years ago
AbsaOSS / hyperdrive
View on GitHub
Extensible streaming ingestion pipeline on top of Apache Spark
☆47Jul 17, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mrpowers-io / spark-style-guide
View on GitHub
Spark style guide
☆271Sep 30, 2024Updated last year
homeaway / datapull
View on GitHub
Cloud based Data Platform based on Apache Spark
☆28Jun 30, 2026Updated 2 weeks ago
deusaquilus / miniquill
View on GitHub
Miniature Quill implementation for Benchmarking and Study
☆18Jan 11, 2023Updated 3 years ago
hammerlab / spark-tests
View on GitHub
Utilities for writing tests that use Apache Spark.
☆24Dec 29, 2018Updated 7 years ago
mrpowers-io / spark-fast-tests
View on GitHub
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
☆458Apr 2, 2026Updated 3 months ago
AbsaOSS / spark-hofs
View on GitHub
Scala API for Apache Spark SQL high-order functions
☆15Aug 4, 2023Updated 2 years ago
lolgab / scala-fullstack
View on GitHub
Full stack skeleton project using Akka-http, Scala.js, Laminar, Sloth, Boopickle
☆15Sep 1, 2020Updated 5 years ago
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
mayur2810 / sope
View on GitHub
Apache Spark ETL Utilities
☆40Oct 23, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
sclasen / flange
View on GitHub
scala client for heroku doozer
☆14Jan 11, 2012Updated 14 years ago
zio-mesh / zio-cookbook
View on GitHub
Cookbook apps for ZIO
☆31Jul 19, 2020Updated 6 years ago
MrPowers / spark-stringmetric
View on GitHub
Spark functions to run popular phonetic and string matching algorithms
☆60Feb 22, 2022Updated 4 years ago
uber / uberscriptquery
View on GitHub
UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy
☆65Dec 17, 2023Updated 2 years ago
BenFradet / struct-type-encoder
View on GitHub
Deriving Spark DataFrame schemas from case classes
☆44Jun 24, 2024Updated 2 years ago
smart-data-lake / smart-data-lake
View on GitHub
Smart Automation Tool for building modern Data Lakes and Data Pipelines
☆129Updated this week
phatak-dev / spark-3.0-examples
View on GitHub
Examples of Spark 3.0
☆44Nov 11, 2020Updated 5 years ago