yodasco / pyspark-emr
A toolset to streamline running spark python on EMR
☆20Updated 8 years ago
Alternatives and similar repositories for pyspark-emr:
Users that are interested in pyspark-emr are comparing it to the libraries listed below
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 2 months ago
- Airflow workflow management platform chef cookbook.☆71Updated 5 years ago
- An example PySpark project with pytest☆17Updated 7 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- HDF masterclass materials☆28Updated 8 years ago
- Some class materials for a data processing course using PySpark☆52Updated 2 years ago
- Quickly get a kubernetes executor airflow environment provisioned on GKE. Azure Kubernetes Service instructions included also as are inst…☆36Updated 4 years ago
- A K8s-based infrastructure for analytics☆24Updated 5 years ago
- Make your libraries magically appear in Databricks.☆47Updated last year
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- Quickstart PySpark with Anaconda on AWS/EMR☆53Updated 8 years ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.☆20Updated 4 months ago
- Composable filesystem hooks and operators for Apache Airflow.☆17Updated 3 years ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆75Updated 6 years ago
- Code to be contributed to the Apache Airflow (incubating) project for ETL workflow management for integrating with the Snowflake Data War…☆25Updated 7 years ago
- A pyspark lib to validate data quality☆18Updated 2 years ago
- AWS Big Data Certification☆25Updated 2 months ago
- Profiles the data, validates the schema and runs data quality checks and produces a report☆20Updated 5 years ago
- Examples for High Performance Spark☆15Updated 4 months ago
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆28Updated 7 years ago
- ☆54Updated 7 years ago
- Python API for Deequ☆41Updated 4 years ago
- A curated list of all the awesome examples, articles, tutorials and videos for Apache Airflow.☆96Updated 4 years ago
- Hadoop Data Pipeline using Falcon☆15Updated 8 years ago
- The open source version of the Amazon EMR Management Guide. You can submit feedback & requests for changes by submitting issues in this r…☆62Updated last year
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 8 years ago
- AWS bootstrap scripts for Mozilla's flavoured Spark setup.☆47Updated 5 years ago
- ☆10Updated 6 years ago