miztiik / emr-on-eksLinks
Run EMR workloads on EKS
☆13Updated 3 years ago
Alternatives and similar repositories for emr-on-eks
Users that are interested in emr-on-eks are comparing it to the libraries listed below
Sorting:
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- Code examples for the Introduction to Kubeflow course☆14Updated 4 years ago
- Read Delta tables without any Spark☆47Updated last year
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆86Updated last year
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 3 years ago
- Build and deploy a serverless data pipeline on AWS with no effort.☆111Updated 2 years ago
- pytest plugin to run the tests with support of pyspark☆86Updated 2 months ago
- scaffold of Apache Airflow executing Docker containers☆86Updated 2 years ago
- Example unit tests for Apache Spark Python scripts using the py.test framework☆84Updated 9 years ago
- A Getting Started Guide for developing and using Airflow Plugins☆93Updated 6 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- Deployment tools/scripts for Metaflow!☆56Updated 2 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Unit and integration testing with PySpark can be tough to figure out, let's make that easier.☆23Updated 9 years ago
- An example PySpark project with pytest☆16Updated 7 years ago
- Fake Pandas / PySpark DataFrame creator☆48Updated last year
- pytest support for airflow☆12Updated 4 years ago
- Composable filesystem hooks and operators for Apache Airflow.☆17Updated 4 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 7 months ago
- Projects developed by Domino's R&D team☆78Updated 3 years ago
- A pyspark lib to validate data quality☆18Updated 2 years ago
- T4 is now in production as Quilt 3☆64Updated 6 years ago
- A python wrapper for the KSQL REST API.☆158Updated 2 years ago
- Pylint plugin for static code analysis on Airflow code☆95Updated 4 years ago
- Asynchronous actions for PySpark☆47Updated 3 years ago
- This repository is no longer maintained.☆15Updated 3 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 5 years ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- Skeleton project for Apache Airflow training participants to work on.☆17Updated 5 years ago
- Terraform module to deploy an Apache Airflow cluster on AWS, backed by RDS PostgreSQL for metadata, S3 for logs and SQS as message broker…☆84Updated 2 years ago