aws-solutions-library-samples / guidance-for-sql-based-etl-with-apache-spark-on-amazon-eks
A guidance that provides declarative data processing capability, and workflow orchestration automation to help your business users (such as analysts and data scientists) access their data and create meaningful insights without the need for manual IT processes.
☆29Updated 5 months ago
Related projects: ⓘ
- This solution helps you deploy ETL jobs on data lake using CDK Pipelines.☆66Updated 2 years ago
- This solution helps you deploy Data Lake Infrastructure on AWS using CDK Pipelines.☆88Updated 2 years ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆44Updated 10 months ago
- Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS☆23Updated last week
- ☆72Updated 10 months ago
- Framework to enforce long term health of your AWS Data Lake by providing visibility into operational, data quality and business metrics.☆17Updated 3 years ago
- Build, Test and Deploy ETL solutions using AWS Glue and AWS CDK based CI/CD pipelines☆36Updated last year
- The Automated Data Analytics on AWS solution provides an end-to-end data platform for ingesting, transforming, managing and querying data…☆89Updated last month
- ☆85Updated 10 months ago
- Spark ETL example processing New York taxi rides public dataset on EKS☆42Updated last year
- Replication utility for AWS Glue Data Catalog☆73Updated last month
- ☆66Updated 3 months ago
- ☆26Updated 3 years ago
- ☆14Updated 2 months ago
- ☆17Updated 9 months ago
- ☆16Updated 5 months ago
- MLOps Pipeline Using SageMaker & CDK, where models are from SageMaker built-in algorithms.☆23Updated last month
- ☆50Updated 2 years ago
- Build DataOps platform with Apache Airflow and dbt on AWS☆51Updated 3 years ago
- Repository for AWS Glue Workshop☆30Updated last year
- Samples to help you get started with the Amazon Redshift Data API☆69Updated last year
- A Data Platform built for AWS, powered by Kubernetes.☆127Updated last year
- Best practices and recommendations for getting started with Amazon EMR on EKS.☆58Updated 3 weeks ago
- ☆43Updated 6 months ago
- An open-source framework that simplifies implementation of data solutions.☆107Updated this week
- The data product processor is a library for dynamically creating and executing Apache Spark Jobs based on a declarative description of a …☆14Updated 4 months ago
- ☆26Updated last month
- ☆15Updated 3 years ago
- Operational Data Processing Framework developed using AWS Glue and Apache Hudi. This framework is suitable for Data Lake and Modern Data …☆21Updated last year
- ☆156Updated 6 months ago