danielblazevski / airflow-pyspark-reddit
Example of using Airflow to schedule downloading data form S3 and launching spark jobs
☆15Updated 8 years ago
Alternatives and similar repositories for airflow-pyspark-reddit:
Users that are interested in airflow-pyspark-reddit are comparing it to the libraries listed below
- ☆28Updated 4 years ago
- An example PySpark project with pytest☆17Updated 7 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 8 years ago
- Source code for 'PySpark Recipes' by Raju Kumar Mishra☆25Updated 5 years ago
- A Getting Started Guide for developing and using Airflow Plugins☆94Updated 6 years ago
- Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc☆51Updated 8 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 5 years ago
- Airflow workflow management platform chef cookbook.☆69Updated 5 years ago
- Setup Apache Airflow on Kubernetes☆9Updated 6 years ago
- Mastering Spark for Data Science, published by Packt☆46Updated 2 years ago
- Apache Airflow CI pipeline☆18Updated 5 years ago
- Some AWS EMR examples☆16Updated 7 years ago
- Terraform module for a PostgreSQL-backed Apache Airflow instance☆24Updated 6 years ago
- Real-world Spark pipelines examples☆84Updated 6 years ago
- Labs and data files for a full-day Spark workshop☆24Updated last year
- Spark and Python (PySpark) Examples☆40Updated 3 years ago
- ☆26Updated last year
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Docker compose files for various kafka stacks☆32Updated 6 years ago
- Python Streaming Pipelines with Beam on Flink - Demo☆14Updated 2 years ago
- Documentation and resources for deploying JupyterHub on Hadoop☆18Updated 5 years ago
- A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0☆25Updated 3 years ago
- Generate Hive CREATE TABLE statements from json data☆10Updated 7 years ago
- 🚚 ETL for Spark and Airflow☆24Updated 6 years ago
- A simple introduction to using spark ml pipelines☆26Updated 6 years ago
- A curated list of awesome Apache Spark packages and resources.☆40Updated 7 years ago
- All Data, Relevant Information, Scripts, and Applications for the Open Data Science Conference (2018)☆11Updated 6 years ago
- Business Data Analysis by HiPIC of CalStateLA☆20Updated 6 years ago
- Repo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializi…☆32Updated 5 years ago
- Fully unit tested utility functions for data engineering. Python 3 only.☆15Updated 4 months ago