aviatesk / deequLinks

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

☆9

Alternatives and similar repositories for deequ

Users that are interested in deequ are comparing it to the libraries listed below

Sorting:

anemos-io / protobeam
☆22Updated 6 years ago
trinodb / tempto
A testing framework for Trino
☆26Updated 4 months ago
gunnarmorling / pgoutput-cli
A command line client for consuming Postgres logical decoding events in the pgoutput format
☆15Updated last week
linkedin / iceberg
A temporary home for LinkedIn's changes to Apache Iceberg (incubating)
☆61Updated 7 months ago
wirelessr / flink-iceberg-playground
minio as local storage and DynamoDB as catalog
☆15Updated last year
steveloughran / zero-rename-committer
Paper: A Zero-rename committer for object stores
☆20Updated 4 years ago
HiveRunner / mutant-swarm
Mutation testing framework and code coverage for Hive SQL
☆24Updated 4 years ago
lensesio / datagen
A small project to allow publishing data to Apache Kafka, Apache Pulsar or any other target system
☆14Updated 4 years ago
paypal / dione
Dione - a Spark and HDFS indexing library
☆52Updated last year
SponsorPay / jaquet
Spark stream from kafka(json) to s3(parquet)
☆15Updated 6 years ago
JetBrains / ztools
☆18Updated 3 years ago
bullet-db / bullet-core
Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…
☆41Updated 2 years ago
japila-books / apache-beam-internals
The Internals of Apache Beam
☆12Updated 5 years ago
lensesio / lenses-go
Lenses.io CLI (command-line interface)
☆37Updated 7 months ago
ExpediaGroup / hiveberg
Demonstration of a Hive Input Format for Iceberg
☆26Updated 4 years ago
ExpediaGroup / beekeeper
Service for automatically managing and cleaning up unreferenced data
☆46Updated 2 weeks ago
xerial / presto-metrics
Presto metric collection library for Ruby
☆26Updated 2 years ago
nielsbasjes / splittablegzip
Splittable Gzip codec for Hadoop
☆71Updated 2 weeks ago
Aiven-Open / guardian-for-apache-kafka
Set of tools for creating backups, compaction and restoration of Apache Kafka® Clusters
☆21Updated last week
rdblue / parquet-cli
Parquet Command-line Tools
☆19Updated 8 years ago
swoop-inc / spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Updated 4 years ago
flyteorg / flyteadmin
Control Plane for Flyte. Flyteadmin is a gRPC + REST Service written in golang and uses a RDBMs to store meta information and management …
☆39Updated last year
maropu / datasketches-spark
Data Sketches for Apache Spark
☆22Updated 2 years ago
rayokota / keta
A Transactional Metadata Store Backed by Apache Kafka
☆24Updated 2 weeks ago
knaufk / flink-testing-pyramid
Example of a tested Apache Flink application.
☆42Updated 6 years ago
bitlap / bitlap
A table schema-less OLAP Analytics Engine for Big Data.
☆24Updated last year
kadensungbincho / apache-ranger-docker-poc
☆25Updated 4 years ago
ing-bank / rokku
Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…
☆69Updated 4 months ago
amundsen-io / amundsengremlin
Amundsen Gremlin
☆21Updated 2 years ago
flyteorg / flyteplugins
Flyte Backend Plugins contributed by the Flyte community.
☆29Updated last year