Python API for Deequ
☆41Nov 10, 2020Updated 5 years ago
Alternatives and similar repositories for pydeequ
Users that are interested in pydeequ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆91Dec 29, 2022Updated 3 years ago
- Python API for Deequ☆820May 9, 2026Updated last week
- Example to create lineage in Atlas with sqoop and spark☆14Apr 5, 2017Updated 9 years ago
- Type-annotate your spark dataframes and validate them☆14Feb 5, 2026Updated 3 months ago
- A project template for developing BYOD docker images for use in Amazon SageMaker.☆19Jan 15, 2020Updated 6 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,617Updated this week
- Simplifies management of Kubernetes Secrets☆12Aug 30, 2024Updated last year
- ☆16Jun 27, 2020Updated 5 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆125May 12, 2026Updated last week
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30May 13, 2026Updated last week
- ☆11Oct 11, 2022Updated 3 years ago
- AWS Glue Configurable Test Data Generator for S3 Data Lakes and DynamoDB☆19Jan 19, 2026Updated 4 months ago
- Automatic and Interpretable Machine Learning with H2O and LIME☆11Feb 21, 2018Updated 8 years ago
- A Gentle introduction to Machine Learning with Apache Spark☆11Mar 2, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- tmux based cli tool for searching s3 objects using fuzzy search☆16Mar 26, 2023Updated 3 years ago
- A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML☆15Dec 24, 2016Updated 9 years ago
- Easily prevent unnecessary build() calls in StatefulWidget and its subtrees.☆14Dec 24, 2021Updated 4 years ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆89Nov 22, 2021Updated 4 years ago
- ☆12Oct 16, 2023Updated 2 years ago
- Basic Spark utilities☆13Feb 20, 2025Updated last year
- An ansible role to install an HA Kubernetes cluster☆13Feb 2, 2020Updated 6 years ago
- SmartFD: Efficient and Scalable Functional Dependency Discovery on Distributed Data-Parallel Platforms☆18Aug 23, 2018Updated 7 years ago
- Demo/Hand-On: Sealed Secrets☆11Nov 21, 2019Updated 6 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Awesome CLI tool for fetching JWT tokens for OAuth2.0 clients☆15Dec 10, 2022Updated 3 years ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆16Oct 3, 2025Updated 7 months ago
- API REST boilerplate using Spring Boot and Redis as database☆13Dec 26, 2018Updated 7 years ago
- 🥞 Zero-to-JupyterHub with Kubernetes using an opinionated tech stack☆12Aug 12, 2019Updated 6 years ago
- AWS Step Function Implementation in JS, so you can run your Node.js lambda handlers in your test environments. Made to support Serverless…☆15May 27, 2022Updated 3 years ago
- Spark Implementation of Google Facets Overview https://github.com/PAIR-code/facets☆56Oct 16, 2023Updated 2 years ago
- Bitcoin library inspired by 'Programming Bitcoin' written in Rust☆16Jan 6, 2025Updated last year
- Using Apache Airflow to author, run and monitor complex data pipelines.☆12Oct 24, 2018Updated 7 years ago
- Java并发编程与高并发解决方案。Java concurrent programming and high concurrency solutions.☆21May 1, 2018Updated 8 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Jul 11, 2018Updated 7 years ago
- Pyspark Notebook With Docker☆11Aug 18, 2015Updated 10 years ago
- A new way to visualize correlations.☆15Jun 21, 2022Updated 3 years ago
- Display CKAN resource views on dataset and home pages☆10Aug 8, 2017Updated 8 years ago
- A new framework to generate interpretable classification rules☆18Feb 11, 2023Updated 3 years ago
- A demonstration of in memory Web API testing of authentication scenarios☆11May 31, 2024Updated last year
- Fuzzy Data Benchmark☆18Feb 8, 2024Updated 2 years ago