miztiik / emr-on-eksLinks
Run EMR workloads on EKS
☆13Updated 4 years ago
Alternatives and similar repositories for emr-on-eks
Users that are interested in emr-on-eks are comparing it to the libraries listed below
Sorting:
- Read Delta tables without any Spark☆47Updated last year
- Code examples for the Introduction to Kubeflow course☆14Updated 4 years ago
- ☆16Updated 5 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- scaffold of Apache Airflow executing Docker containers☆86Updated 2 years ago
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆75Updated 2 years ago
- Pylint plugin for static code analysis on Airflow code☆96Updated 4 years ago
- Bare minimal Airflow on Kubernetes (Local, EKS, AKS)☆53Updated 5 years ago
- Asynchronous actions for PySpark☆47Updated 3 years ago
- A Getting Started Guide for developing and using Airflow Plugins☆93Updated 6 years ago
- Composable filesystem hooks and operators for Apache Airflow.☆17Updated 4 years ago
- pytest support for airflow☆12Updated 4 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated 10 months ago
- Helpers & syntactic sugar for PySpark.☆62Updated 2 years ago
- Skeleton project for Apache Airflow training participants to work on.☆17Updated 5 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 9 months ago
- Fake Pandas / PySpark DataFrame creator☆48Updated last year
- Build and deploy a serverless data pipeline on AWS with no effort.☆111Updated 2 years ago
- Functional Airflow DAG definitions.☆38Updated 8 years ago
- Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics☆65Updated last year
- A CLI to manage and monitor permissions in AWS Lake Formation☆25Updated 2 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 3 years ago
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆85Updated last year
- Dask integration for Snowflake☆30Updated 2 months ago
- Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.☆38Updated 3 years ago
- An example PySpark project with pytest☆17Updated 8 years ago
- ☆23Updated 4 years ago
- pytest plugin to run the tests with support of pyspark☆87Updated 4 months ago
- Filling in the Spark function gaps across APIs☆50Updated 4 years ago