aws / aws-emr-best-practices
A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational excellence, reliability and application specific best practices across Spark, Hive, Hudi, Hbase and more.
☆102Updated this week
Related projects ⓘ
Alternatives and complementary repositories for aws-emr-best-practices
- Best practices and recommendations for getting started with Amazon EMR on EKS.☆61Updated last week
- ☆38Updated last month
- Example code for running Spark and Hive jobs on EMR Serverless.☆153Updated this week
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆83Updated last year
- ☆66Updated 5 months ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆48Updated last year
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated last year
- Spark runtime on AWS Lambda☆94Updated 2 months ago
- ☆85Updated last year
- Replication utility for AWS Glue Data Catalog☆74Updated 3 months ago
- A VS Code Extension to make it easier to manage and develop Spark jobs on EMR☆29Updated 2 months ago
- The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this r…☆61Updated last year
- Example applications in Java, Python and SQL for Kinesis Data Analytics, demonstrating sources, sinks, and operators.☆139Updated 6 months ago
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆205Updated 6 months ago
- A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs☆38Updated 6 months ago
- ☆26Updated 3 years ago
- ☆53Updated last year
- This repository contains the dbt-glue adapter☆101Updated this week
- ☆158Updated 8 months ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆19Updated 3 months ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆65Updated last year
- Samples to help you get started with the Amazon Redshift Data API☆71Updated last year
- Amazon Kinesis Data Analytics Flink Starter Kit helps you with the development of Flink Application with Kinesis Stream as a source and A…☆47Updated last year
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Amazon Managed Service for Apache Flink Benchmarking Utility helps with capacity planning, integration testing, and benchmarking of Amazo…☆20Updated last year
- ☆20Updated 8 months ago
- Performant Redshift data source for Apache Spark☆136Updated 3 months ago
- The open source version of the AWS Glue docs. You can submit feedback & requests for changes by submitting issues in this repo or by maki…☆199Updated last year
- Reference Architectures for Datalakes on AWS☆79Updated 4 years ago
- Amazon Managed Workflows for Apache Airflow (MWAA) Examples repository contains example DAGs, requirements.txt, plugins, and CloudFormati…☆106Updated last week