Docker with Airflow and Spark standalone cluster
☆263Aug 5, 2023Updated 2 years ago
Alternatives and similar repositories for airflow-spark
Users that are interested in airflow-spark are comparing it to the libraries listed below
Sorting:
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆24Apr 2, 2022Updated 3 years ago
- ☆41Jan 24, 2023Updated 3 years ago
- ☆12Feb 11, 2022Updated 4 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Mar 29, 2021Updated 4 years ago
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆176May 28, 2025Updated 9 months ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆509Nov 7, 2025Updated 4 months ago
- Spark application to consume kafka events generated by a python producer.☆12Aug 7, 2021Updated 4 years ago
- A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)☆13May 2, 2021Updated 4 years ago
- This is a recipe for docker container based architecture based on airflow, kafka,spark,docker☆20Oct 15, 2024Updated last year
- Code for Data Pipelines with Apache Airflow☆812Aug 15, 2024Updated last year
- A simple spark standalone cluster for your testing environment purposses☆568Mar 6, 2024Updated 2 years ago
- ☆46Jul 6, 2024Updated last year
- ☆16Jan 19, 2022Updated 4 years ago
- Beginner data engineering project - batch edition☆565Jan 22, 2025Updated last year
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆34Oct 18, 2020Updated 5 years ago
- Spark app to merge different schemas☆23Dec 21, 2020Updated 5 years ago
- Execution of DBT models using Apache Airflow through Docker Compose☆127Jan 3, 2023Updated 3 years ago
- An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…☆318Feb 14, 2025Updated last year
- ☆13Jul 8, 2025Updated 7 months ago
- End to End Sales Streaming Pipeline (FastAPI, Kafka, Spark, Cassandra, MySQL, Superset)☆10May 26, 2023Updated 2 years ago
- ☆26Nov 22, 2022Updated 3 years ago
- Materials for the next course☆25Feb 3, 2023Updated 3 years ago
- This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)☆11Apr 29, 2022Updated 3 years ago
- A data pipeline with Kafka, Spark Streaming, dbt, Docker, Airflow, and GCP!☆12Jul 6, 2023Updated 2 years ago
- ☆11Mar 15, 2017Updated 8 years ago
- dlt-dagster-demo☆13Nov 6, 2023Updated 2 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆162Jun 16, 2020Updated 5 years ago
- Apache Airflow tutorial☆974Nov 3, 2022Updated 3 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Nov 22, 2021Updated 4 years ago
- The training process for Credit and Risk Assessment Large Language Model (CALM)☆10Oct 15, 2023Updated 2 years ago
- ☆14Jan 14, 2017Updated 9 years ago
- Docker envinroment to stream data from Kafka to Iceberg tables☆30Feb 27, 2024Updated 2 years ago
- Spark Standalone & Livy☆11Jul 13, 2021Updated 4 years ago
- Deploy a complete data stack in just a couple of minutes.☆15Mar 6, 2024Updated 2 years ago
- Dockerizing and Consuming an Apache Livy environment☆13Jun 29, 2022Updated 3 years ago
- An R Shiny module to put swiping interfaces in your app!☆13Mar 29, 2017Updated 8 years ago
- A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.☆16Oct 27, 2022Updated 3 years ago
- ☆12Mar 17, 2022Updated 3 years ago
- PySpark test helper methods with beautiful error messages☆753Feb 25, 2026Updated last week