MoranReznik / PySpark-Reference-Notebook

☆63

Related projects ⓘ

Alternatives and complementary repositories for PySpark-Reference-Notebook

sidharth1805 / Spotify_etl
☆128Updated last year
Kevin-Nduati / My-Spotify-Wrapped
I will attempt to create my own spotify wrapped by collecting data from the spotify API, perform transformations and create informative d…
☆74Updated last year
mharty3 / energy_data_capstone
☆30Updated last year
JesusAcuna / data-engineering-project
☆27Updated last year
Zachlq / Professional_Portfolio
My current data engineering portfolio. Includes projects spanning ETL, orchestration and dashboarding.
☆103Updated 7 months ago
MemoonaTahira / MLZoomcamp2022
My repo for the Machine Learning Engineering bootcamp 2022 by DataTalks.Club
☆21Updated last year
vinamrgrover / ETL-Pipeline-Airflow-Reddit-API
☆23Updated last year
afaqueahmad7117 / spark-experiments
Ultimate guide for mastering Spark Performance Tuning and Optimization concepts and for preparing for Data Engineering interviews
☆70Updated 6 months ago
dogukannulu / airflow_kafka_cassandra_mongodb
Produce Kafka messages, consume them and upload into Cassandra, MongoDB.
☆37Updated last year
eeeds / employees-attrition-mlops
Final Project of the MLOps Zoomcamp hosted by DataTalksClub.
☆25Updated last year
AndrejaCH / Movies-ETL
For this project I am creating an ETL (Extract, Transform, and Load) pipeline using Python, RegEx, and SQL Database. The goal is to retri…
☆25Updated 3 years ago
dogukannulu / kafka_spark_structured_streaming
Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra
☆128Updated last year
darshilparmar / twitter-airflow-data-engineering-project
YouTube tutorial project
☆94Updated last year
mharty3 / data_engineering_zoomcamp_2022
☆31Updated 2 years ago
boisalai / de-zoomcamp-2023
☆41Updated last year
EcZachly / video-game-training-sql
Hey this is the repo that has all the queries and data for my video game training series!
☆132Updated 2 years ago
Suwarti / Customer-Segmentation
☆57Updated 3 years ago
hyunjoonbok / PySpark
PySpark functions and utilities with examples. Assists ETL process of data modeling
☆99Updated 3 years ago
josephmachado / data_engineering_best_practices
Sample project to demonstrate data engineering best practices
☆166Updated 8 months ago
itversity / data-engineering-spark
☆86Updated 2 years ago
AnandDedha / aws-airflow-dataengineering-pipeline
☆18Updated 10 months ago
DataTalksClub / zoomcamp-analytics
Public data and analytics for our open course
☆30Updated 7 months ago
kb1907 / PySpark_Projects
PySpark Projects
☆21Updated 3 weeks ago
honghanhh / coursera-practical-data-science-specialization
Solutions on Practical Data Science Specialization on Coursera (offered by deeplearning.ai)
☆59Updated 3 years ago
immu0001 / Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
☆74Updated 11 months ago
SourabhSinghRana / real-time_crypto_data_pipeline_using_kafka
I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that uti…
☆28Updated last year
uhussain / WebCrawlerForOnlineInflation
Price Crawler - Tracking Price Inflation
☆184Updated 4 years ago
PacktPublishing / Mastering-Big-Data-Analytics-with-PySpark
Mastering Big Data Analytics with PySpark, Published by Packt
☆156Updated 3 months ago
ArpiteshSrivastava / spotify-data-engineering-project
In this project, we will build and ETL(Extract,Transform,Load) pipeline using the Spotify API on AWS. The pipeline will retrieve data fro…
☆21Updated last year