dsaidgovsg / python-spark
Docker image for a Python installation with Spark, Hadoop and Sqoop binaries
β15Updated 6 years ago
Related projects β
Alternatives and complementary repositories for python-spark
- Just a boilerplate for PySpark and Flaskβ35Updated 6 years ago
- π¨ Simple, self-contained fraud detection system built with Apache Kafka and Pythonβ83Updated 5 years ago
- Repo for all my code on the articles I post on mediumβ105Updated 2 years ago
- Use Airflow to move data from multiple MySQL databases to BigQueryβ99Updated 4 years ago
- A Scalable Data Cleaning Library for PySpark.β26Updated 5 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apachβ¦β19Updated 8 years ago
- Challenge for those applying to the Software Engineer, Big Data positionβ34Updated 13 years ago
- Code to build a simple analytics data pipeline with Pythonβ102Updated 7 years ago
- β16Updated 6 years ago
- Basic tutorial of using Apache Airflowβ35Updated 6 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatioβ¦β53Updated last year
- Airflow training for the crunch confβ105Updated 6 years ago
- Helping you get Airflow running in production.β9Updated 5 years ago
- PyConDE & PyData Berlin 2019 Airflow Workshop: Airflow for machine learning pipelines.β46Updated last year
- Example of an ETL Pipeline using Airflowβ32Updated 7 years ago
- Udacity Data Pipeline Exercisesβ15Updated 4 years ago
- PySpark Code for Hands-on Learnersβ114Updated 5 years ago
- Using Luigi to create a Machine Learning Pipeline using the Rossman Sales data from Kaggleβ33Updated 8 years ago
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMRβ173Updated last year
- curated list of awesome tools and libraries for specific domainsβ35Updated this week
- Processing tweets using Spark Streaming and identifying top trending hashtags using a real-time simple dashboardβ42Updated 2 years ago
- A curated list of awesome Databricks resources, including Sparkβ14Updated 4 months ago
- This repo holds all the content for the Machine Learning training dayβ22Updated 2 years ago
- Docker container for Kafka - Spark Streaming - Cassandraβ97Updated 5 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operatorβ75Updated 5 years ago
- This is a simple streaming application that utilises Kafka and Pythonβ45Updated 5 years ago
- A cookiecutter template for Apache Spark applications written in Scalaβ10Updated 5 years ago
- A curated list of all the awesome examples, articles, tutorials and videos for Apache Airflow.β96Updated 3 years ago