ongxuanhong / de02-pyspark-optimization
☆14Updated 2 years ago
Alternatives and similar repositories for de02-pyspark-optimization
Users that are interested in de02-pyspark-optimization are comparing it to the libraries listed below
Sorting:
- Code for dbt tutorial☆157Updated 11 months ago
- Simple stream processing pipeline☆102Updated 10 months ago
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- This project shows how to capture changes from postgres database and stream them into kafka☆36Updated 11 months ago
- ☆16Updated last year
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆42Updated last year
- Code snippets for Data Engineering Design Patterns book☆106Updated last month
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆69Updated last year
- Course notes for the Astronomer Certification DAG Authoring for Apache Airflow☆52Updated last year
- Local Environment to Practice Data Engineering☆142Updated 4 months ago
- End to end data engineering project☆54Updated 2 years ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆51Updated last year
- build dw with dbt☆44Updated 6 months ago
- Near real time ETL to populate a dashboard.☆72Updated 10 months ago
- Docker with Airflow and Spark standalone cluster☆257Updated last year
- Execution of DBT models using Apache Airflow through Docker Compose☆116Updated 2 years ago
- ☆12Updated 4 years ago
- Sample project to demonstrate data engineering best practices☆190Updated last year
- Building a Modern Data Lake with Minio, Spark, Airflow via Docker.☆20Updated last year
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆46Updated last year
- ☆265Updated 6 months ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆56Updated last year
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 2 years ago
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆34Updated last year
- Building a Data Pipeline with an Open Source Stack☆54Updated 10 months ago
- ☆130Updated 3 months ago
- trino monitoring with JMX metrics through Prometheus and Grafana☆14Updated 9 months ago
- ☆36Updated 2 years ago
- Quick Guides from Dremio on Several topics☆71Updated 3 months ago
- ☆81Updated 4 months ago