Automated data quality suggestions and analysis with Deequ on AWS Glue
☆93Dec 29, 2022Updated 3 years ago
Alternatives and similar repositories for amazon-deequ-glue
Users that are interested in amazon-deequ-glue are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Python API for Deequ☆41Nov 10, 2020Updated 5 years ago
- Python API for Deequ☆822Jun 11, 2026Updated 3 weeks ago
- ☆12Oct 16, 2023Updated 2 years ago
- ☆23Oct 3, 2024Updated last year
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,625Jun 25, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Replication utility for AWS Glue Data Catalog☆80Aug 8, 2024Updated last year
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆65Oct 17, 2023Updated 2 years ago
- ☆11Oct 11, 2022Updated 3 years ago
- Data Quality Monitoring Tool☆15Dec 5, 2017Updated 8 years ago
- A tool to automate analytic platform evaluations. Barometer helps customers to get data points needed for service selection/service confi…☆19Jun 3, 2024Updated 2 years ago
- An open source development framework to help you build data workflows and modern data architecture on AWS.☆271Feb 9, 2026Updated 4 months ago
- ☆12Aug 9, 2024Updated last year
- A Singer.io Target for Snowflake☆11Jun 9, 2023Updated 3 years ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆53Oct 31, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- cli tool for searching cloudtrail events using fuzzy search☆18Feb 21, 2023Updated 3 years ago
- Operational Data Processing Framework developed using AWS Glue and Apache Hudi. This framework is suitable for Data Lake and Modern Data …☆24Sep 6, 2023Updated 2 years ago
- Amazon Managed Service for Apache Flink Benchmarking Utility helps with capacity planning, integration testing, and benchmarking of Amazo…☆21Aug 30, 2023Updated 2 years ago
- ☆17Jul 21, 2025Updated 11 months ago
- Amazon Kinesis Data Analytics Flink Starter Kit helps you with the development of Flink Application with Kinesis Stream as a source and A…☆47Aug 30, 2023Updated 2 years ago
- Enterprise-grade, production-hardened, serverless data lake on AWS☆480Oct 1, 2025Updated 9 months ago
- Amazon SageMaker MLOps deployment pipeline for A/B Testing of machine learning models.☆45Jun 7, 2021Updated 5 years ago
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆20May 13, 2020Updated 6 years ago
- Imputation of missing values in tables.☆492Jan 14, 2026Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A collection of examples built with AWS DataOps Development Kit (DDK)☆43Mar 23, 2026Updated 3 months ago
- A custom AWS credential provider that allows your Hadoop or Spark application access S3 file system by assuming a role☆10Jan 9, 2026Updated 5 months ago
- AWS Glue code samples☆1,534Jun 8, 2026Updated 3 weeks ago
- ☆16Jan 31, 2022Updated 4 years ago
- 🌉 Reference implementation for granting cross-account AWS Glue Data Catalog access from Amazon Athena☆30Jul 25, 2022Updated 3 years ago
- ☆16Jun 14, 2023Updated 3 years ago
- ☆20May 21, 2024Updated 2 years ago
- Sample code for deploying IAM password policies across a fleet of AWS accounts using CloudFormation StackSets☆11Nov 26, 2021Updated 4 years ago
- This repository has configuration files to set up an open-source tool named Okta AWS CLI Assume Role Tool (https://github.com/oktadevelop…☆10May 18, 2020Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆25Jul 2, 2018Updated 8 years ago
- The open source version of the AWS Glue docs. You can submit feedback & requests for changes by submitting issues in this repo or by maki…☆201Jun 15, 2023Updated 3 years ago
- Framework to enforce long term health of your AWS Data Lake by providing visibility into operational, data quality and business metrics.☆31Aug 19, 2021Updated 4 years ago
- This solution helps you deploy ETL jobs on data lake using CDK Pipelines.☆69Aug 9, 2022Updated 3 years ago
- Snapshot manager for Amazon Kinesis Data Analytics for Apache Flink helps the users to generate a snapshot on a periodic basis.☆19Aug 30, 2023Updated 2 years ago
- A toolset to streamline running spark python on EMR☆20Nov 16, 2016Updated 9 years ago
- This repository contains sample code that is used to demonstrate building, deploying and invoking a SageMaker model for heart disease pre…☆10Oct 14, 2020Updated 5 years ago