aws / aws-emr-best-practices
A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational excellence, reliability and application specific best practices across Spark, Hive, Hudi, Hbase and more.
☆106Updated last month
Alternatives and similar repositories for aws-emr-best-practices
Users that are interested in aws-emr-best-practices are comparing it to the libraries listed below
Sorting:
- Best practices and recommendations for getting started with Amazon EMR on EKS.☆63Updated last week
- Example code for running Spark and Hive jobs on EMR Serverless.☆164Updated 4 months ago
- ☆40Updated 2 months ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆50Updated last year
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆85Updated 2 years ago
- ☆73Updated 11 months ago
- Spark runtime on AWS Lambda☆107Updated 7 months ago
- The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…☆220Updated last month
- This repository contains the dbt-glue adapter☆120Updated last week
- Replication utility for AWS Glue Data Catalog☆78Updated 9 months ago
- ☆88Updated last year
- A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs☆41Updated last year
- Amazon Managed Workflows for Apache Airflow (MWAA) Examples repository contains example DAGs, requirements.txt, plugins, and CloudFormati…☆115Updated 5 months ago
- A VS Code Extension to make it easier to manage and develop Spark jobs on EMR☆36Updated 2 months ago
- The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this r…☆62Updated last year
- ☆24Updated last year
- Performant Redshift data source for Apache Spark☆139Updated 3 weeks ago
- Samples to help you get started with the Amazon Redshift Data API☆73Updated last year
- dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats☆29Updated 2 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Spark ETL example processing New York taxi rides public dataset on EKS☆44Updated 2 years ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆27Updated 8 months ago
- Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog☆35Updated last year
- ☆73Updated last year
- Spark Structured Streaming Kinesis Data Streams connector supports both GetRecords and SubscribeToShard (Enhanced Fan-Out, EFO)☆35Updated 2 weeks ago
- An open-source framework that simplifies implementation of data solutions.☆133Updated this week
- Build DataOps platform with Apache Airflow and dbt on AWS☆55Updated 3 years ago
- AWS Glue Schema Registry Client library provides serializers / de-serializers for applications to integrate with AWS Glue Schema Registry…☆136Updated 3 months ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- Example applications in Java, Python and SQL for Kinesis Data Analytics, demonstrating sources, sinks, and operators.☆143Updated 11 months ago