aminelemaizi / micro-cluster-lab
A micro cluster lab to experiment Dask and Spark (Python and Scala) based on Docker
☆15Updated last year
Related projects ⓘ
Alternatives and complementary repositories for micro-cluster-lab
- ☆8Updated last month
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆237Updated 4 months ago
- ETL pipeline using pyspark (Spark - Python)☆108Updated 4 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆123Updated 2 years ago
- Spark development environment for kubernetes, spark-submit and jupyter notebook☆19Updated 2 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆422Updated 3 weeks ago
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆80Updated 5 years ago
- Fundamentals of Spark with Python (using PySpark), code examples☆331Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆244Updated last year
- ☆68Updated 5 months ago
- Multi-container environment with Hadoop, Spark and Hive☆202Updated 10 months ago
- RedditR for Content Engagement and Recommendation☆21Updated 6 years ago
- My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggrega…☆487Updated 2 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆18Updated 2 months ago
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆20Updated 2 years ago
- A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)☆11Updated 3 years ago
- Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake☆100Updated 2 months ago
- ☆22Updated last year
- Apartments Data Pipeline using Airflow and Spark.☆18Updated 2 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Apache Spark 3 - Structured Streaming Course Material☆119Updated last year
- PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like…☆80Updated last year
- Airflow training for the crunch conf☆105Updated 6 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆43Updated 5 years ago
- ☆22Updated 8 months ago
- Simple stream processing pipeline☆91Updated 4 months ago
- This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language☆559Updated 7 months ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆296Updated 2 years ago
- 🐍 Quick reference guide to common patterns & functions in PySpark.☆447Updated last year
- Projects done in the Data Engineer Nanodegree Program by Udacity.com☆94Updated last year