airscholar / SparkingFlowLinks
This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.
☆44Updated last year
Alternatives and similar repositories for SparkingFlow
Users that are interested in SparkingFlow are comparing it to the libraries listed below
Sorting:
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆41Updated last year
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆32Updated last year
- This project shows how to capture changes from postgres database and stream them into kafka☆36Updated last year
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆45Updated last year
- The resources of the preparation course for Databricks Data Engineer Professional certification exam☆117Updated this week
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆37Updated last year
- ☆51Updated last year
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆62Updated last year
- This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering en…☆21Updated last year
- End to end data engineering project☆56Updated 2 years ago
- Code snippets for Data Engineering Design Patterns book☆119Updated 3 months ago
- An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…☆257Updated 4 months ago
- Simple stream processing pipeline☆102Updated last year
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆96Updated 3 months ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- ☆87Updated 4 months ago
- Near real time ETL to populate a dashboard.☆72Updated last year
- Nyc_Taxi_Data_Pipeline - DE Project☆110Updated 8 months ago
- ☆14Updated 2 years ago
- Local Environment to Practice Data Engineering☆142Updated 5 months ago
- Docker with Airflow and Spark standalone cluster☆258Updated last year
- Building a Modern Data Lake with Minio, Spark, Airflow via Docker.☆20Updated last year
- ☆40Updated 11 months ago
- This repo contains "Databricks Certified Data Engineer Professional" Questions and related docs.☆78Updated 10 months ago
- ☆87Updated 2 years ago
- ☆65Updated last month
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆53Updated last year
- End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API…☆20Updated 11 months ago
- Sample project to demonstrate data engineering best practices☆194Updated last year
- Apartments Data Pipeline using Airflow and Spark.☆21Updated 3 years ago