RealKinetic / aws-glue-pipeline-example
An example CI/CD pipeline using GitHub Actions for doing continuous deployment of AWS Glue jobs built on PySpark and Jupyter Notebooks.
β12Updated 4 years ago
Alternatives and similar repositories for aws-glue-pipeline-example
Users that are interested in aws-glue-pipeline-example are comparing it to the libraries listed below
Sorting:
- πComplete End to End ETL Pipeline with Spark, Airflow, & AWSβ45Updated 5 years ago
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Sparkβ11Updated 6 years ago
- Spark data pipeline that processes movie ratings data.β28Updated last month
- A Pyspark job to handle upserts, conversion to parquet and create partitions on S3β26Updated 4 years ago
- Udacity Data Engineer Nano Degree - Project-3 (Data Warehouse)β22Updated 5 years ago
- β34Updated 2 years ago
- All the Snowflake Virtual Warehouse - Exampleβ12Updated 4 years ago
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)β16Updated 6 years ago
- β25Updated 4 years ago
- β23Updated 2 years ago
- Serverless ETL and Analytics with AWS Glue, published by Packtβ48Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatioβ¦β55Updated 2 years ago
- Simplifying Data Engineering and Analytics with Delta, published by Packtβ21Updated last year
- Data Engineering on GCPβ35Updated 2 years ago
- Resources for video demonstrations and blog posts related to DataOps on AWSβ176Updated 3 years ago
- code snippet for analytics sessionsβ34Updated 3 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Aβ¦β41Updated 2 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract dβ¦β24Updated 3 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modelingβ103Updated 4 years ago
- Snowflake Cookbook, published by Packtβ79Updated 2 years ago
- Spark app to merge different schemasβ23Updated 4 years ago
- Simplify Big Data Analytics with Amazon EMR, published by Packtβ13Updated 2 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as β¦β16Updated 5 years ago
- CICD pipeline that deploys a dbt image on a GKE clusterβ11Updated 3 years ago
- Git repo to accompany the AWS DevOps Blog: Using AWS DevOps Tools to model and provision AWS Glue workflowsβ20Updated 3 years ago
- This repository is for demonstrating the capability to do SQL-based UPDATES, DELETES, and INSERTS directly in the Data Lake using Amazon β¦β16Updated 3 years ago
- Udacity Data Engineering Nanodegree Capstone Projectβ36Updated 5 years ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflowβ33Updated 4 years ago
- Data Modeling with Snowflake, published by Packtβ65Updated last month
- Azure Data Engineering Cookbook 2nd-edition, published by Packtβ32Updated last year