RealKinetic / aws-glue-pipeline-example
An example CI/CD pipeline using GitHub Actions for doing continuous deployment of AWS Glue jobs built on PySpark and Jupyter Notebooks.
☆12Updated 3 years ago
Related projects: ⓘ
- A Pyspark job to handle upserts, conversion to parquet and create partitions on S3☆26Updated 4 years ago
- Serverless ETL and Analytics with AWS Glue, published by Packt☆45Updated 11 months ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆39Updated 5 years ago
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11Updated 6 years ago
- Udacity Data Engineer Nano Degree - Project-3 (Data Warehouse)☆22Updated 5 years ago
- ☆17Updated 4 years ago
- Data Engineering with Databricks Cookbook, published by Packt☆26Updated 3 months ago
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)☆13Updated 5 years ago
- Snowflake Cookbook, published by Packt☆72Updated last year
- Azure Data Engineering Cookbook 2nd-edition, published by Packt☆31Updated last year
- Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and mo…☆25Updated 3 years ago
- code snippet for analytics sessions☆31Updated 2 years ago
- ☆34Updated last year
- Simplifying Data Engineering and Analytics with Delta, published by Packt☆20Updated last year
- Repository for AWS Glue Workshop☆30Updated last year
- ☆84Updated 2 years ago
- DAS-C01 ACG/LA by Brock Tubre and John Hanna☆125Updated 9 months ago
- Spark data pipeline that processes movie ratings data.☆26Updated last month
- Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects☆38Updated 3 months ago
- Git repo to accompany the AWS DevOps Blog: Using AWS DevOps Tools to model and provision AWS Glue workflows☆17Updated 2 years ago
- Example repo to create end to end tests for data pipeline.☆21Updated 3 months ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- Road to Azure Data Engineer Part-II: DP-201 - Designing an Azure Data Solution☆19Updated 4 years ago
- Simplify Big Data Analytics with Amazon EMR, published by Packt☆14Updated last year
- Data engineering with dbt, published by Packt☆55Updated 6 months ago
- AWS Glue tutorial for data developers.☆23Updated 5 years ago
- This repository is for demonstrating the capability to do SQL-based UPDATES, DELETES, and INSERTS directly in the Data Lake using Amazon …☆16Updated 3 years ago
- Study materials for the AWS Big Data / Data Analytics Specialty Exam☆26Updated 2 years ago
- Code Repository for AWS Certified Big Data Specialty 2019 - In Depth and Hands On!, published by Packt☆38Updated 10 months ago
- GitHub repository related to the course Mastering Elastic Map Reduce for Data Engineers☆22Updated 2 years ago