margitaii/pydeequ

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/margitaii/pydeequ)

margitaii / pydeequ

Python API for Deequ

☆41

Alternatives and similar repositories for pydeequ

Users that are interested in pydeequ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aws-samples / amazon-deequ-glue
View on GitHub
Automated data quality suggestions and analysis with Deequ on AWS Glue
☆93Dec 29, 2022Updated 3 years ago
awslabs / python-deequ
View on GitHub
Python API for Deequ
☆823Updated this week
shwethags / atlas-lineage
View on GitHub
Example to create lineage in Atlas with sqoop and spark
☆14Apr 5, 2017Updated 9 years ago
aws-samples / data-profiler-for-aws-glue-data-catalog
View on GitHub
Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…
☆20May 13, 2020Updated 6 years ago
aws-samples / amazon-sagemaker-BYOD-template
View on GitHub
A project template for developing BYOD docker images for use in Amazon SageMaker.
☆19Jan 15, 2020Updated 6 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
rdjondo / TensorFlowGPUonUbuntu
View on GitHub
Installing Tensorflow on Ubuntu 16.04 (Tested with Alienware Aurora R5)
☆11Apr 15, 2017Updated 9 years ago
svakulenk0 / tweet2vec_clustering
View on GitHub
☆26Oct 17, 2017Updated 8 years ago
ritchie46 / serverless-model-aws
View on GitHub
Deploy any Machine Learning model serverless in AWS.
☆23Oct 17, 2018Updated 7 years ago
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,635Updated this week
homeaway / datapull
View on GitHub
Cloud based Data Platform based on Apache Spark
☆28Jun 30, 2026Updated 3 weeks ago
richardanaya / spark_delta_lake
View on GitHub
☆16Jun 27, 2020Updated 6 years ago
awslabs / amazon-s3-tagging-spark-util
View on GitHub
☆12Oct 16, 2023Updated 2 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago
aws-samples / dbtgluenyctaxidemo
View on GitHub
☆11Oct 11, 2022Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
aws-samples / amazon-emr-optimize-data-processing
View on GitHub
Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark
☆14Apr 14, 2023Updated 3 years ago
piotr-kalanski / data-quality-monitoring
View on GitHub
Data Quality Monitoring Tool
☆15Dec 5, 2017Updated 8 years ago
kellnr / helm
View on GitHub
Helm chart to deploy Kellnr on kubernetes
☆15Jul 2, 2026Updated 2 weeks ago
woobe / lime_water
View on GitHub
Automatic and Interpretable Machine Learning with H2O and LIME
☆11Feb 21, 2018Updated 8 years ago
jamartinh / Orange3-Spark
View on GitHub
A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML
☆15Dec 24, 2016Updated 9 years ago
blockchain-etl / etl-rust
View on GitHub
☆11Nov 5, 2024Updated last year
daskos / mentor
View on GitHub
Extensible Python Framework for Apache Mesos
☆33Oct 19, 2017Updated 8 years ago
mark-hoffmann / fastteradata
View on GitHub
Tools for faster and optimized interaction with Teradata and large datasets.
☆17Jul 11, 2018Updated 8 years ago
semantalytics / awesome-druid
View on GitHub
☆62May 29, 2019Updated 7 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
beamlynx / pine-lang
View on GitHub
Writing SQL can be easier - pine makes it happen!
☆12May 21, 2026Updated last month
aws-samples / sample-amazon-bedrock-reliability-patterns
View on GitHub
☆16Nov 18, 2025Updated 8 months ago
beacon50 / SimpleJDBC
View on GitHub
A JDBC driver for Amazon's SimpleDB
☆11Apr 17, 2014Updated 12 years ago
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
schelterlabs / jenga
View on GitHub
Jenga is an experimentation library that allows data science practititioners and researchers to study the effect of common data corruptio…
☆43Jun 21, 2023Updated 3 years ago
ronald-smith-angel / owl-data-sanitizer
View on GitHub
A pyspark lib to validate data quality
☆19Nov 11, 2022Updated 3 years ago
parente / z2jh-aws
View on GitHub
🥞 Zero-to-JupyterHub with Kubernetes using an opinionated tech stack
☆12Aug 12, 2019Updated 6 years ago
daskos / daskos
View on GitHub
Apache Mesos backend for Dask scheduling library
☆28Oct 19, 2017Updated 8 years ago
divyam-rai / simple-kafka-sasl-docker-python
View on GitHub
Due to lack of resources on how to deploy kafka with simple SASL authentication (just username and password) and how to write producer an…
☆12Dec 29, 2021Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
gopro / facets-overview-spark
View on GitHub
Spark Implementation of Google Facets Overview https://github.com/PAIR-code/facets
☆56Oct 16, 2023Updated 2 years ago
aljoscha / blog
View on GitHub
Thoughts on things I find interesting.
☆17Dec 19, 2024Updated last year
RongleXie / concurrency
View on GitHub
Java并发编程与高并发解决方案。Java concurrent programming and high concurrency solutions.
☆21May 1, 2018Updated 8 years ago
takuya-takeuchi / MXNetDotNet
View on GitHub
.NET wrapper for Apache MXNet written in C#
☆13Feb 16, 2020Updated 6 years ago
suhailrehman / fuzzydata
View on GitHub
Fuzzy Data Benchmark
☆18Feb 8, 2024Updated 2 years ago
qubole / spark-state-store
View on GitHub
Rocksdb state storage implementation for Structured Streaming.
☆17Oct 21, 2020Updated 5 years ago
Strata-Labs / BitScript
View on GitHub
☆15Updated this week