nanlabs / aws-glue-etl-boilerplate
A complete example of an AWS Glue application that uses the Serverless Framework to deploy the infrastructure and DevContainers and/or Docker Compose to run the application locally with AWS Glue Libs, Spark, Jupyter Notebook, AWS CLI, among other tools. It provides jobs using Python Shell and PySpark.
☆18Updated 7 months ago
Alternatives and similar repositories for aws-glue-etl-boilerplate:
Users that are interested in aws-glue-etl-boilerplate are comparing it to the libraries listed below
- This repository contains different Frontend related resources like applications, examples, libraries, tools, etc.☆17Updated 9 months ago
- This repository contains different React components, hooks, apps and libraries that are used in different projects here at NaN Labs.☆23Updated 3 months ago
- This is a curated list of all the Open Source examples and projects we have at NaNLABS☆20Updated this week
- This repository contains different infrastructure components, CI/CD pipelines, automation tools among other resources that are used in di…☆45Updated 7 months ago
- This is a plugin for Serverless framework that provide the possibility to deploy AWS Glue Jobs and Triggers☆25Updated last week
- ☆30Updated 11 months ago
- A terraform module that creates an airflow instance in AWS ECS.☆60Updated last year
- Extract, transform, and load data for analytic processing using AWS Glue☆17Updated 3 years ago
- An open-source framework that simplifies implementation of data solutions.☆123Updated this week
- Data Lake as Code, featuring ChEMBL and OpenTargets☆169Updated last year
- This solution helps you deploy ETL jobs on data lake using CDK Pipelines.☆67Updated 2 years ago
- The open source version of the AWS Glue docs. You can submit feedback & requests for changes by submitting issues in this repo or by maki…☆198Updated last year
- ☆31Updated 11 months ago
- ☆16Updated last year
- ☆73Updated last year
- ☆67Updated 8 months ago
- This solution helps you deploy Data Lake Infrastructure on AWS using CDK Pipelines.☆94Updated 2 years ago
- ☆60Updated 3 years ago
- Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS☆29Updated this week
- Terraform code to deploy a SageMaker domain in VPC-only mode that supports multiple Studio and Canvas features☆19Updated last year
- Spark runtime on AWS Lambda☆105Updated 5 months ago
- Example code for running Spark and Hive jobs on EMR Serverless.☆161Updated last month
- An open source development framework to help you build data workflows and modern data architecture on AWS.☆261Updated this week
- Script for quickly creating AWS Lambda Layers☆45Updated 3 years ago
- ☆158Updated 11 months ago
- Apache Airflow - A platform to programmatically author, schedule, and monitor workflows☆9Updated this week
- amazon-sagemaker-cdk-examples uses AWS CDK to simplify common architectures in machine leaning operations using Sagemaker and other AWS s…☆68Updated 10 months ago
- Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the…☆242Updated last week
- Resources for video demonstrations and blog posts related to DataOps on AWS☆172Updated 3 years ago
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆84Updated 2 years ago