danielblazevski / airflow-pyspark-redditLinks
Example of using Airflow to schedule downloading data form S3 and launching spark jobs
☆15Updated 8 years ago
Alternatives and similar repositories for airflow-pyspark-reddit
Users that are interested in airflow-pyspark-reddit are comparing it to the libraries listed below
Sorting:
- Docker compose files for various kafka stacks☆32Updated 7 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 9 years ago
- ☆28Updated 4 years ago
- A Getting Started Guide for developing and using Airflow Plugins☆93Updated 6 years ago
- Terraform module for a PostgreSQL-backed Apache Airflow instance☆24Updated 7 years ago
- A cookiecutter template for Apache Spark applications written in Scala☆10Updated 6 years ago
- ☆10Updated 7 years ago
- Using Luigi to create a Machine Learning Pipeline using the Rossman Sales data from Kaggle☆33Updated 8 years ago
- A curated list of all the awesome examples, articles, tutorials and videos for Apache Airflow.☆96Updated 4 years ago
- Repo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializi…☆32Updated 6 years ago
- Dump mysql tables to s3, and parse them☆31Updated 10 years ago
- Telecom scenarios implemented with streaming techniques☆11Updated 2 years ago
- Spark Application UI extension for JupyterLab☆10Updated 3 years ago
- Cubes OLAP Examples☆74Updated 7 years ago
- Apache Airflow CI pipeline☆19Updated 6 years ago
- An example PySpark project with pytest☆16Updated 7 years ago
- DataHub on AWS demonstration resources☆10Updated 2 years ago
- Setup Apache Airflow on Kubernetes☆10Updated 6 years ago
- ☆26Updated 4 years ago
- Business Data Analysis by HiPIC of CalStateLA☆20Updated 6 years ago
- Helping you get Airflow running in production.☆9Updated 5 years ago
- Source code for 'PySpark Recipes' by Raju Kumar Mishra☆25Updated 5 years ago
- Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc☆51Updated 8 years ago
- Advanced Elasticsearch 7.0, by Packt☆45Updated 2 years ago
- A luigi powered analytics / warehouse stack☆88Updated 8 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- Code repository for Spark for Data Science by Packt☆16Updated 2 years ago
- Udacity Data Pipeline Exercises☆15Updated 5 years ago
- Some class materials for a data processing course using PySpark☆52Updated 2 years ago
- Course materials for my data pipeline video course with O'Reilly☆198Updated 7 years ago