ecloudvalley / Building-a-Data-Lake-with-AWS-Glue-and-Amazon-S3Links
☆17Updated 6 years ago
Alternatives and similar repositories for Building-a-Data-Lake-with-AWS-Glue-and-Amazon-S3
Users that are interested in Building-a-Data-Lake-with-AWS-Glue-and-Amazon-S3 are comparing it to the libraries listed below
Sorting:
- AWS Big Data Certification☆25Updated 4 months ago
- Glue VSCode devcontainer setup☆14Updated 2 years ago
- Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc☆51Updated 8 years ago
- A self-paced workshop designed to allow you to get hands on with building a real-time data platform using serverless technologies such as…☆22Updated 6 years ago
- ☆16Updated 4 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 6 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 4 months ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 8 years ago
- Ingest tweets with Kafka. Use Spark to track popular hashtags and trendsetters for each hashtag☆29Updated 9 years ago
- ⭕️ Minimum Viable Machine Learning☆33Updated 4 years ago
- A Pyspark job to handle upserts, conversion to parquet and create partitions on S3☆26Updated 4 years ago
- Spark and Python (PySpark) Examples☆39Updated 3 years ago
- This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS …☆19Updated 3 years ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- ☆16Updated 7 years ago
- ☆26Updated last year
- As customers move from building data lakes and analytics on AWS to building machine learning solutions, one of their biggest challenges i…☆63Updated 6 years ago
- DataHub on AWS demonstration resources☆10Updated 2 years ago
- ☆9Updated 8 months ago
- Creating a Streaming Pipeline for user log data in Google Cloud Platform☆22Updated 5 years ago
- Code repository for Learning Apache Spark 2, published by Packt☆21Updated 2 years ago
- Mastering Spark for Data Science, published by Packt☆47Updated 2 years ago
- Big Data Demystified meetup and blog examples☆31Updated 9 months ago
- Code and setup information for Introduction to Machine Learning with Spark☆12Updated 9 years ago
- Basic tutorial of using Apache Airflow☆36Updated 6 years ago
- Airflow workflow management platform chef cookbook.☆71Updated 5 years ago
- ☆16Updated 2 years ago
- The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this r…☆62Updated last year