ankurchavda/SparkLearning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ankurchavda/SparkLearning)

ankurchavda / SparkLearning

A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.

☆690

Alternatives and similar repositories for SparkLearning

Users that are interested in SparkLearning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ankurchavda / Python-Interview-Questions
View on GitHub
☆19Jun 22, 2022Updated 4 years ago
ankurchavda / streamify
View on GitHub
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
☆889Apr 16, 2022Updated 4 years ago
ankurchavda / data-engineering-zoomcamp
View on GitHub
A course by DataTalks Club that covers Spark, Kafka, Docker, Airflow, Terraform, DBT, Big Query etc
☆17Mar 18, 2022Updated 4 years ago
rayaroun / Azure-DP203-Preparation
View on GitHub
☆41Nov 19, 2021Updated 4 years ago
damklis / DataEngineeringProject
View on GitHub
Example end to end data engineering project.
☆1,417Dec 8, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
danielbeach / data-engineering-practice
View on GitHub
Data Engineering Practice Problems
☆2,791Jan 8, 2025Updated last year
tuplex / tuplex
View on GitHub
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tupl…
☆813Aug 10, 2025Updated 11 months ago
datastacktv / data-engineer-roadmap
View on GitHub
Roadmap to becoming a data engineer in 2021
☆12,751Jan 25, 2022Updated 4 years ago
devmindset / sparkscalainterview
View on GitHub
Contain Interview Questions Solutions
☆12May 18, 2018Updated 8 years ago
andkret / Cookbook
View on GitHub
The Data Engineering Cookbook
☆15,181Jun 12, 2026Updated last month
adilkhash / Data-Engineering-HowTo
View on GitHub
A list of useful resources to learn Data Engineering from scratch
☆4,002Jun 19, 2024Updated 2 years ago
cartershanklin / pyspark-cheatsheet
View on GitHub
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
☆497Oct 15, 2024Updated last year
AlexIoannides / pyspark-example-project
View on GitHub
Implementing best practices for PySpark ETL jobs and applications.
☆2,119Jan 1, 2023Updated 3 years ago
Clivern / Peanut
View on GitHub
🐺 Deploy Databases and Services Easily for Development and Testing Pipelines.
☆726Jul 12, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
DataTalksClub / data-engineering-zoomcamp
View on GitHub
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…
☆43,843Jun 10, 2026Updated last month
ankit-rathi / Self-Starter-Handbook
View on GitHub
Build Your Own Roadmap
☆11Jul 8, 2020Updated 6 years ago
abhishek-ch / around-dataengineering
View on GitHub
A Data Engineering & Machine Learning Knowledge Hub
☆1,143Feb 2, 2024Updated 2 years ago
oleg-agapov / data-engineering-book
View on GitHub
Accumulated knowledge and experience in the field of Data Engineering
☆873Nov 22, 2022Updated 3 years ago
zverok / wikipedia_ql
View on GitHub
Query language for efficient data extraction from Wikipedia
☆347Feb 16, 2022Updated 4 years ago
ABZ-Aaron / reddit-api-pipeline
View on GitHub
☆402Jan 26, 2025Updated last year
harjeet88 / Data_engineering_interview
View on GitHub
☆13Sep 23, 2023Updated 2 years ago
renatootescu / ETL-pipeline
View on GitHub
Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.
☆351Jan 12, 2022Updated 4 years ago
palantir / pyspark-style-guide
View on GitHub
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring…
☆1,255Sep 8, 2025Updated 10 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
databricks / LearningSparkV2
View on GitHub
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
☆1,399Jan 28, 2025Updated last year
diegolnasc / kubernetes-best-practices
View on GitHub
A cookbook with the best practices to working with kubernetes.
☆1,503Apr 27, 2026Updated 2 months ago
igorbarinov / awesome-data-engineering
View on GitHub
A curated list of data engineering tools for software developers
☆8,876Updated this week
Pushkr / Apache-Spark-Hands-On
View on GitHub
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
☆87Jan 22, 2019Updated 7 years ago
databricks / devrel
View on GitHub
This repository contains the notebooks and presentations we use for our Databricks Tech Talks
☆734Jan 6, 2025Updated last year
protonradio / player
View on GitHub
Proton Player is an HTML5-based streaming music player optimized for compatibility across many devices and browsers.
☆93Apr 15, 2026Updated 3 months ago
MGessinger / trident
View on GitHub
Trident provides an easy way to pass the output of one command to any number of targets.
☆34Sep 26, 2021Updated 4 years ago
alanchn31 / Data-Engineering-Projects
View on GitHub
Personal Data Engineering Projects
☆1,024Feb 8, 2023Updated 3 years ago
san089 / Udacity-Data-Engineering-Projects
View on GitHub
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake developme…
☆1,953Aug 26, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OBenner / data-engineering-interview-questions
View on GitHub
More than 2000+ Data engineer interview questions.
☆1,694Jan 13, 2026Updated 6 months ago
itversity / data-engineering-spark
View on GitHub
☆95Sep 14, 2022Updated 3 years ago
tirthajyoti / Spark-with-Python
View on GitHub
Fundamentals of Spark with Python (using PySpark), code examples
☆366Oct 29, 2022Updated 3 years ago
fivetran / great_expectations
View on GitHub
Always know what to expect from your data.
☆11,658Updated this week
snird / awesome-data-engineering-learning
View on GitHub
Awesome list of data engineering learning materials by subject
☆544Jun 9, 2021Updated 5 years ago
MrPowers / chispa
View on GitHub
PySpark test helper methods with beautiful error messages
☆772Jul 12, 2026Updated last week
eugeneyan / applied-ml
View on GitHub
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
☆29,941Jul 18, 2024Updated 2 years ago