yodasco / pyspark-emr
A toolset to streamline running spark python on EMR
☆20Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for pyspark-emr
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 7 years ago
- A simple introduction to using spark ml pipelines☆26Updated 6 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52Updated 8 years ago
- Quickstart PySpark with Anaconda on AWS/EMR☆53Updated 7 years ago
- REST-like API exposing Airflow data and operations☆61Updated 5 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆75Updated 5 years ago
- Conversion utility from Zeppelin notes to Jupyter notebooks.☆44Updated 4 years ago
- Mastering Spark for Data Science, published by Packt☆46Updated last year
- Some class materials for a data processing course using PySpark☆51Updated last year
- An example PySpark project with pytest☆17Updated 7 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- 🚚 ETL for Spark and Airflow☆24Updated 6 years ago
- Real time and offline time series analysis with Spark, Spark Streaming and Storm☆21Updated 4 years ago
- Examples for High Performance Spark☆15Updated 3 weeks ago
- Coding exercises for Apache Spark☆104Updated 9 years ago
- These are some code examples☆55Updated 4 years ago
- HDF masterclass materials☆28Updated 8 years ago
- event-triggered plugins for airflow☆21Updated 4 years ago
- A Spark WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR☆118Updated 8 years ago
- Make your libraries magically appear in Databricks.☆47Updated last year
- Terraform script for launching multiple EMR clusters for training purposes.☆16Updated last year
- Simple Spark example of generating table stats for use of data quality checks☆28Updated 7 years ago
- Automates Spark standalone cluster tasks with Puppet and Fabric.☆43Updated 10 years ago
- Convert a CSV fle to ORCFile☆26Updated 5 years ago
- Airflow workflow management platform chef cookbook.☆68Updated 5 years ago
- hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.☆28Updated 6 years ago
- Real-world Spark pipelines examples☆83Updated 6 years ago