josephmachado/efficient_data_processing_spark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/josephmachado/efficient_data_processing_spark)

josephmachado / efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

☆390

Alternatives and similar repositories for efficient_data_processing_spark

Users that are interested in efficient_data_processing_spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

josephmachado / data_engineering_best_practices
View on GitHub
Sample project to demonstrate data engineering best practices
☆221Feb 24, 2024Updated 2 years ago
josephmachado / beginner_de_project
View on GitHub
Beginner data engineering project - batch edition
☆583Apr 13, 2026Updated 3 months ago
Snowboard-Software / dbt_airbyte_shopify_facebook_paypal_fedex_gls_ecommerce_profitability
View on GitHub
This repository is a production dbt pipeline example that model the profitability of an e-commerce business. Data is extracted and loaded…
☆30Jun 14, 2024Updated 2 years ago
raashidsalih / churn-pipeline
View on GitHub
A custom end-to-end analytics platform for customer churn
☆10May 15, 2025Updated last year
josephmachado / simple_polars_etl
View on GitHub
☆16Apr 26, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
im-nsk / Building-an-Automated-Weather-Data-Pipeline-with-Airflow-From-Ingestion-to-Data-Warehouse
View on GitHub
This project focuses on building a robust data pipeline using Apache Airflow to automate the ingestion of weather data from the OpenWeath…
☆22Feb 3, 2026Updated 5 months ago
josephmachado / online_store
View on GitHub
End to end data engineering project
☆59Oct 27, 2022Updated 3 years ago
josephmachado / cost_effective_data_pipelines
View on GitHub
Cost Efficient Data Pipelines with DuckDB
☆61May 14, 2025Updated last year
josephmachado / data-quality-w-greatexpectations
View on GitHub
Code for data quality with greatexpectations blog
☆13Jul 30, 2024Updated last year
ryanbrownnetworking777 / dataengineerio-capstone-ryanbrown
View on GitHub
capstone project for Dataengineer.io bootcamp Public Repo
☆12Feb 20, 2024Updated 2 years ago
josephmachado / data_engineering_project_template
View on GitHub
A template repository to create a data project with IAC, CI/CD, Data migrations, & testing
☆296Jul 11, 2024Updated 2 years ago
ssp-data / practical-data-engineering
View on GitHub
Practical Data Engineering: A Hands-On Real-Estate Project Guide
☆815Jun 25, 2026Updated 3 weeks ago
danielbeach / data-engineering-practice
View on GitHub
Data Engineering Practice Problems
☆2,791Jan 8, 2025Updated last year
mattiasthalen / obsidian-insights
View on GitHub
Personal project for setting up an open source data warehouse.
☆32Jul 11, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
josephmachado / sde_de101_josephmachado
View on GitHub
Sample repo for startdataengineering DE 101 free course
☆74Jun 24, 2024Updated 2 years ago
josephmachado / data_engineering_best_practices_log
View on GitHub
Code to demonstrate data engineering metadata & logging best practices
☆22Mar 12, 2024Updated 2 years ago
nydasco / data-pipeline-demo
View on GitHub
A demonstration of an ELT (Extract, Load, Transform) pipeline
☆32Feb 19, 2024Updated 2 years ago
josephmachado / beginner_de_project_stream
View on GitHub
Simple stream processing pipeline
☆112Jun 17, 2024Updated 2 years ago
josephmachado / adv_data_transformation_in_sql
View on GitHub
Code for "Advanced data transformations in SQL" free live workshop
☆93May 5, 2025Updated last year
josephmachado / data_helper
View on GitHub
Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/
☆13May 24, 2024Updated 2 years ago
bartosz25 / data-engineering-design-patterns-book
View on GitHub
Code snippets for Data Engineering Design Patterns book
☆410Jun 13, 2026Updated last month
josephmachado / local_dev
View on GitHub
Local development environment for python data projects, with Docker
☆23Dec 14, 2022Updated 3 years ago
HamzaG737 / data-engineering-project
View on GitHub
End to end data engineering project with kafka, airflow, spark, postgres and docker.
☆116Jan 8, 2026Updated 6 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ankurchavda / streamify
View on GitHub
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
☆889Apr 16, 2022Updated 4 years ago
trannhatnguyen2 / NYC_Taxi_Data_Pipeline
View on GitHub
Nyc_Taxi_Data_Pipeline - DE Project
☆150Oct 21, 2024Updated last year
10Kang / DE_Zoomcamp2024_ZY
View on GitHub
Repository for Data Engineering Zoomcamp 2024
☆14Mar 25, 2024Updated 2 years ago
l-mds / local-data-stack
View on GitHub
Slow & local data allows you to move fast and deliver business value for the 99.9% of the data challenges.
☆396Sep 30, 2025Updated 9 months ago
josephmachado / analytical_dp_with_sql
View on GitHub
Code for my "Efficient Data Processing in SQL" book.
☆63Aug 6, 2024Updated last year
josephmachado / python_essentials_for_data_engineers
View on GitHub
Learn how Python powers real data engineering. Build end-to-end pipelines from scratch with fully working code and step-by-step video wal…
☆109Jul 8, 2026Updated last week
dlt-hub / dlt_demos
View on GitHub
demo examples how to load data from different sources to different destinations
☆31Jun 22, 2026Updated 3 weeks ago
longNguyen010203 / Youtube-Recommend-Master-ETL-Pipeline
View on GitHub
A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Doc…
☆25Nov 19, 2024Updated last year
DataExpert-io / data-engineer-handbook
View on GitHub
This is a repo with links to everything you'd ever want to learn about data engineering
☆42,313Apr 2, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
josephmachado / iceberg-features
View on GitHub
☆14Dec 11, 2023Updated 2 years ago
escobar-west / polars-cookbook
View on GitHub
Recipes for using Python's polars library
☆277Sep 8, 2024Updated last year
DataTalksClub / data-engineering-zoomcamp
View on GitHub
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…
☆43,843Jun 10, 2026Updated last month
kaoutaar / end-to-end-etl-pipeline-jcdecaux-API
View on GitHub
velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…
☆21Aug 12, 2025Updated 11 months ago
JChz6 / DE-ZCamp-Project
View on GitHub
☆15Mar 29, 2024Updated 2 years ago
alanchn31 / Data-Engineering-Projects
View on GitHub
Personal Data Engineering Projects
☆1,024Feb 8, 2023Updated 3 years ago
OBenner / data-engineering-interview-questions
View on GitHub
More than 2000+ Data engineer interview questions.
☆1,694Jan 13, 2026Updated 6 months ago