AbsaOSS / atumLinks

A dynamic data completeness and accuracy library at enterprise scale for Apache Spark

☆29

Alternatives and similar repositories for atum

Users that are interested in atum are comparing it to the libraries listed below

Sorting:

AbsaOSS / spark-hofs
Scala API for Apache Spark SQL high-order functions
☆14Updated 2 years ago
funkyminds / cleanframes
type-class based data cleansing library for Apache Spark SQL
☆78Updated 6 years ago
AbsaOSS / hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
☆46Updated 4 months ago
yaooqinn / itachi
A library that brings useful functions from various modern database management systems to Apache Spark
☆61Updated 2 years ago
CoxAutomotiveDataSolutions / waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
☆76Updated last year
indix / schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
☆113Updated 5 years ago
swoop-inc / spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Updated 4 years ago
hortonworks-spark / spark-schema-registry
Schema Registry integration for Apache Spark
☆40Updated 3 years ago
AbsaOSS / enceladus
Dynamic Conformance Engine
☆32Updated last month
ing-bank / scruid
Scala + Druid: Scruid. A library that allows you to compose queries in Scala, and parse the result back into typesafe classes.
☆116Updated 4 years ago
datamindedbe / lighthouse
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…
☆62Updated last year
springnz / sparkplug
A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.
☆47Updated 9 years ago
zalando-incubator / spark-json-schema
JSON schema parser for Apache Spark
☆82Updated 3 years ago
bartosz25 / spark-scala-playground
Sample processing code using Spark 2.1+ and Scala
☆51Updated 5 years ago
airbnb / sputnik
☆63Updated 6 years ago
indix / sparkplug
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
☆29Updated 5 years ago
HeartSaVioR / spark-state-tools
Spark Structured Streaming State Tools
☆34Updated 5 years ago
cerndb / SparkPlugins
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆94Updated 6 months ago
apache / incubator-retired-amaterasu
Apache Amaterasu
☆56Updated 6 years ago
target / data-validator
A tool to validate data, built around Apache Spark.
☆100Updated 2 weeks ago
FINRAOS / MegaSparkDiff
A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…
☆52Updated 5 months ago
ZuInnoTe / spark-hadoopoffice-ds
A Spark datasource for the HadoopOffice library
☆37Updated 2 months ago
AbsaOSS / spark-hats
Nested array transformation helper extensions for Apache Spark
☆37Updated 2 years ago
MrPowers / bebe
Filling in the Spark function gaps across APIs
☆50Updated 4 years ago
sparsecode / DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…
☆26Updated 4 years ago
agile-lab-dev / wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging …
☆31Updated last month
smart-data-lake / smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
☆123Updated 2 weeks ago
MrPowers / spark-stringmetric
Spark functions to run popular phonetic and string matching algorithms
☆60Updated 3 years ago
mayur2810 / sope
Apache Spark ETL Utilities
☆39Updated last year
51zero / eel-sdk
Big Data Toolkit for the JVM
☆145Updated 5 years ago