amanparmar17 / Kafka_PysparkLinks
Base Kafka Producer, consumer, flask api and PySpark Structured streaming Job
☆11Updated 3 years ago
Alternatives and similar repositories for Kafka_Pyspark
Users that are interested in Kafka_Pyspark are comparing it to the libraries listed below
Sorting:
- Testing Spark Structured Streaming anf Kafka with real data from traffic sensors☆16Updated 2 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆46Updated last year
- Source code of the Apache Airflow Tutorial for Beginners on YouTube Channel Coder2j (https://www.youtube.com/c/coder2j)☆322Updated last year
- An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…☆282Updated 8 months ago
- Apache Spark using SQL☆14Updated 4 years ago
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆29Updated last year
- This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering en…☆22Updated last year
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆42Updated 2 years ago
- Example repo to create end to end tests for data pipeline.☆25Updated last year
- Mastering Big Data Analytics with PySpark, Published by Packt☆161Updated last year
- ☆40Updated 2 years ago
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆143Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆260Updated 2 years ago
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆63Updated 2 years ago
- ☆62Updated 4 years ago
- Project for "Data pipeline design patterns" blog.☆46Updated last year
- This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging…☆93Updated 6 years ago
- Data Engineering Capstone Project: ETL Pipelines and Data Warehouse Development☆21Updated 6 years ago
- Simple stream processing pipeline☆110Updated last year
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆41Updated last year
- This project introduces PySpark, a powerful open-source framework for distributed data processing. We explore its architecture, component…☆34Updated last year
- End to end data engineering project☆57Updated 2 years ago
- Classwork projects and home works done through Udacity data engineering nano degree☆74Updated last year
- Spark, Airflow, Kafka☆26Updated 2 years ago
- Course Material Data Engineering on AWS Course☆29Updated last year
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)☆17Updated 6 years ago
- PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like…☆134Updated 2 years ago
- Data Engineering Bootcamp☆30Updated 2 months ago
- Code for dbt tutorial☆162Updated last month
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆49Updated 6 years ago