Anant / example-airflow-and-spark
☆12Updated 3 years ago
Alternatives and similar repositories for example-airflow-and-spark:
Users that are interested in example-airflow-and-spark are comparing it to the libraries listed below
- End-to-end Kafka Streaming Examples on Databricks with Evolving Avro Schemas.☆9Updated last year
- Docker with Airflow and Spark standalone cluster☆255Updated last year
- A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!☆12Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Updated 4 years ago
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- ☆86Updated 2 months ago
- Ravi Azure ADB ADF Repository☆66Updated 3 months ago
- The resources of the preparation course for Databricks Data Engineer Professional certification exam☆112Updated 3 weeks ago
- Resources for video demonstrations and blog posts related to DataOps on AWS☆175Updated 3 years ago
- ☆87Updated 2 years ago
- 📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.☆43Updated 3 months ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- Guide for databricks spark certification☆58Updated 3 years ago
- Local Environment to Practice Data Engineering☆142Updated 3 months ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆101Updated 4 years ago
- Repository related to Spark SQL and Pyspark using Python3☆37Updated 2 years ago
- Unit testing using databricks connect☆31Updated 3 years ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆49Updated last year
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆27Updated last year
- This project is for demonstrating knowledge of Data Engineering tools and concepts and also learning in the process☆46Updated 2 years ago
- PySpark Cheatsheet☆36Updated 2 years ago
- ☆50Updated last year
- Data Engineering with Google Cloud Platform, published by Packt☆117Updated last year
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆22Updated 2 years ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆244Updated 2 months ago
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆23Updated 3 years ago
- A template repository to create a data project with IAC, CI/CD, Data migrations, & testing☆260Updated 9 months ago
- Companion repository that goes along with Snowflake's "Introduction to Modern Data Engineering with Snowflake" course on Coursera☆48Updated 2 months ago
- Git Repository☆140Updated 2 months ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆143Updated 4 years ago