yodasco / pyspark-emr
A toolset to streamline running spark python on EMR
☆20Updated 8 years ago
Alternatives and similar repositories for pyspark-emr:
Users that are interested in pyspark-emr are comparing it to the libraries listed below
- An example PySpark project with pytest☆17Updated 7 years ago
- Examples for High Performance Spark☆15Updated 4 months ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 2 months ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- Profiles the data, validates the schema and runs data quality checks and produces a report☆20Updated 5 years ago
- Airflow workflow management platform chef cookbook.☆71Updated 5 years ago
- Coding exercises for Apache Spark☆104Updated 9 years ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- Simple Spark example of generating table stats for use of data quality checks☆28Updated 7 years ago
- The sane way of building a data layer in Airflow☆24Updated 5 years ago
- Hadoop Data Pipeline using Falcon☆15Updated 8 years ago
- HDF masterclass materials☆28Updated 8 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 6 months ago
- Monitor Twitter stream for S&P 500 companies to identify & act on unexpected increases in tweet volume☆39Updated 9 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- This repository is to help with the Partner Demonstration of the Apache Atlas project.☆30Updated 9 years ago
- Example unit tests for Apache Spark Python scripts using the py.test framework☆84Updated 8 years ago
- 🚚 ETL for Spark and Airflow☆24Updated 6 years ago
- Quickstart PySpark with Anaconda on AWS/EMR☆53Updated 8 years ago
- Some class materials for a data processing course using PySpark☆52Updated 2 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 8 years ago
- Make your libraries magically appear in Databricks.☆47Updated last year
- ☆14Updated 8 years ago
- Workshop for Hadoop Operations Best Practices☆10Updated 10 years ago
- Skeleton project for Apache Airflow training participants to work on.☆16Updated 4 years ago
- Convert a CSV fle to ORCFile☆26Updated 5 years ago
- ☆7Updated 9 years ago
- Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.☆20Updated 4 months ago