aws / aws-emr-best-practices
A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational excellence, reliability and application specific best practices across Spark, Hive, Hudi, Hbase and more.
☆102Updated this week
Related projects ⓘ
Alternatives and complementary repositories for aws-emr-best-practices
- Example code for running Spark and Hive jobs on EMR Serverless.☆151Updated 2 weeks ago
- Best practices and recommendations for getting started with Amazon EMR on EKS.☆61Updated last week
- ☆38Updated last month
- ☆66Updated 5 months ago
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆83Updated last year
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated last year
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆47Updated last year
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this r…☆61Updated last year
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆205Updated 5 months ago
- Spark runtime on AWS Lambda☆93Updated last month
- Replication utility for AWS Glue Data Catalog☆74Updated 3 months ago
- ☆85Updated last year
- ☆20Updated 7 months ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆17Updated 2 months ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆65Updated last year
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆33Updated 11 months ago
- A VS Code Extension to make it easier to manage and develop Spark jobs on EMR☆29Updated 2 months ago
- Samples to help you get started with the Amazon Redshift Data API☆71Updated last year
- Example applications in Java, Python and SQL for Kinesis Data Analytics, demonstrating sources, sinks, and operators.☆139Updated 5 months ago
- This repository contains the dbt-glue adapter☆99Updated this week
- A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs☆38Updated 5 months ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆76Updated 6 years ago
- ☆18Updated 3 years ago
- Amazon Managed Workflows for Apache Airflow (MWAA) Examples repository contains example DAGs, requirements.txt, plugins, and CloudFormati…☆106Updated last month
- Reference Architectures for Datalakes on AWS☆79Updated 4 years ago
- Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects☆39Updated 5 months ago
- ☆26Updated 3 years ago
- Performant Redshift data source for Apache Spark☆136Updated 3 months ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year