dsaidgovsg / python-sparkLinks
Docker image for a Python installation with Spark, Hadoop and Sqoop binaries
☆15Updated 7 years ago
Alternatives and similar repositories for python-spark
Users that are interested in python-spark are comparing it to the libraries listed below
Sorting:
- Just a boilerplate for PySpark and Flask☆35Updated 6 years ago
- The purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacio…☆62Updated 6 years ago
- 🚨 Simple, self-contained fraud detection system built with Apache Kafka and Python☆87Updated 6 years ago
- Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc☆51Updated 8 years ago
- Generalized project for running Airflow DAGs, with possibility of skipping tasks already done for some set of input parameters.☆15Updated 2 years ago
- Various data stream/batch process demo with Apache Scala Spark 🚀☆11Updated 5 years ago
- Big Data Demystified meetup and blog examples☆31Updated 9 months ago
- running apache spark with docker swarm☆34Updated 4 years ago
- Udacity Data Pipeline Exercises☆15Updated 5 years ago
- PyConDE & PyData Berlin 2019 Airflow Workshop: Airflow for machine learning pipelines.☆47Updated last year
- Apache Spark docker container image (Standalone mode)☆35Updated 4 years ago
- ☆17Updated 6 years ago
- Challenge for those applying to the Software Engineer, Big Data position☆35Updated 13 years ago
- Learn to build a data pipeline with Airflow to automate wrangling data - An Udacity Data Engineer Nano Degree Project☆8Updated 5 years ago
- ☆16Updated 7 years ago
- Set up a 3 node spark cluster using docker containers☆34Updated 7 years ago
- Basic tutorial of using Apache Airflow☆36Updated 6 years ago
- Sentiment Analysis of a Twitter Topic with Spark Structured Streaming☆55Updated 6 years ago
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆174Updated last week
- Slack notifications for the Luigi workflow manager☆46Updated 3 years ago
- Example of event-driven architecture with FastAPI Gateway, Kafka, Redis pub/sub and Faust-streaming☆15Updated 3 years ago
- ☆49Updated 3 years ago
- A curated list of awesome Databricks resources, including Spark☆19Updated 11 months ago
- ☆16Updated 4 years ago
- Automated testing and deployment of a simple Flask-based (RESTful) micro-service to a production-like environment on AWS, using Docker co…☆43Updated 2 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 8 years ago
- Code that goes along with https://humansofdata.atlan.com/2018/06/apache-airflow-disease-outbreaks-india/☆24Updated last year
- Anomaly Detection model uses Spark for training and Spark Streaming for testing☆67Updated 9 years ago
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆21Updated 2 years ago
- Simple samples for writing ETL transform scripts in Python☆22Updated 3 years ago