mrn-aglic / pyspark-playgroundLinks

☆90

Alternatives and similar repositories for pyspark-playground

Users that are interested in pyspark-playground are comparing it to the libraries listed below

Sorting:

cordon-thiago / airflow-spark
Docker with Airflow and Spark standalone cluster
☆261Updated 2 years ago
Armaan1Gohil / dataengineering-tech-stack
Local Environment to Practice Data Engineering
☆143Updated 7 months ago
airscholar / e2e-data-engineering
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…
☆268Updated 5 months ago
abeltavares / real-time-data-pipeline
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
☆47Updated 6 months ago
bartosz25 / data-engineering-design-patterns-book
Code snippets for Data Engineering Design Patterns book
☆142Updated 4 months ago
mrn-aglic / spark-standalone-cluster
This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.
☆35Updated 2 years ago
josephmachado / efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
☆326Updated 2 months ago
cluster-apps-on-docker / spark-standalone-cluster-on-docker
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.
☆496Updated 2 years ago
arezamoosavi / AcidOnSpark-ETL
Delta-Lake, ETL, Spark, Airflow
☆47Updated 2 years ago
derar-alhussein / Databricks-Certified-Data-Engineer-Professional
The resources of the preparation course for Databricks Data Engineer Professional certification exam
☆127Updated last month
alonsomedo / os-data-stack
Building a Data Pipeline with an Open Source Stack
☆55Updated last month
josephmachado / simple_dbt_project
Code for dbt tutorial
☆159Updated 2 months ago
josephmachado / data_engineering_best_practices
Sample project to demonstrate data engineering best practices
☆195Updated last year
josephmachado / data_engineering_project_template
A template repository to create a data project with IAC, CI/CD, Data migrations, & testing
☆271Updated last year
trannhatnguyen2 / NYC_Taxi_Data_Pipeline
Nyc_Taxi_Data_Pipeline - DE Project
☆115Updated 9 months ago
dogukannulu / streaming_data_processing
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
☆63Updated 2 years ago
1ambda / lakehouse
Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)
☆59Updated last year
dominikhei / Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…
☆73Updated last year
TJaniF / airflow-elt-blueprint
A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.
☆74Updated last year
josephmachado / beginner_de_project_stream
Simple stream processing pipeline
☆103Updated last year
HamzaG737 / data-engineering-project
End to end data engineering project with kafka, airflow, spark, postgres and docker.
☆98Updated 4 months ago
hnawaz007 / pythondataanalysis
Python data repo, jupyter notebook, python scripts and data.
☆519Updated 7 months ago
dogukannulu / kafka_spark_structured_streaming
Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra
☆141Updated 2 years ago
delta-io / delta-examples
Delta Lake examples
☆227Updated 10 months ago
thanhENC / e2e-data-platform
End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…
☆42Updated 9 months ago
hnawaz007 / dbt-dw
build dw with dbt
☆48Updated 9 months ago
adidas / lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…
☆257Updated last week
sdesilva26 / docker-spark
Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines
☆133Updated 2 years ago
Marcel-Jan / docker-hadoop-spark
Multi-container environment with Hadoop, Spark and Hive
☆218Updated 3 months ago
yTek01 / docker-spark-airflow
☆40Updated 2 years ago