A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
☆690Apr 22, 2022Updated 4 years ago
Alternatives and similar repositories for SparkLearning
Users that are interested in SparkLearning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19Jun 22, 2022Updated 4 years ago
- A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!☆883Apr 16, 2022Updated 4 years ago
- A course by DataTalks Club that covers Spark, Kafka, Docker, Airflow, Terraform, DBT, Big Query etc☆17Mar 18, 2022Updated 4 years ago
- ☆41Nov 19, 2021Updated 4 years ago
- Example end to end data engineering project.☆1,414Dec 8, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Data Engineering Practice Problems☆2,741Jan 8, 2025Updated last year
- Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tupl…☆814Aug 10, 2025Updated 10 months ago
- Roadmap to becoming a data engineer in 2021☆12,747Jan 25, 2022Updated 4 years ago
- The Data Engineering Cookbook☆15,162Jun 12, 2026Updated 2 weeks ago
- A list of useful resources to learn Data Engineering from scratch☆3,996Jun 19, 2024Updated 2 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆495Oct 15, 2024Updated last year
- Implementing best practices for PySpark ETL jobs and applications.☆2,111Jan 1, 2023Updated 3 years ago
- 🐺 Deploy Databases and Services Easily for Development and Testing Pipelines.☆726Jun 23, 2026Updated last week
- Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Jo…☆42,757Jun 10, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Build Your Own Roadmap☆11Jul 8, 2020Updated 5 years ago
- Contain Interview Questions Solutions☆12May 18, 2018Updated 8 years ago
- Accumulated knowledge and experience in the field of Data Engineering☆872Nov 22, 2022Updated 3 years ago
- A Data Engineering & Machine Learning Knowledge Hub☆1,144Feb 2, 2024Updated 2 years ago
- Query language for efficient data extraction from Wikipedia☆347Feb 16, 2022Updated 4 years ago
- ☆12Sep 23, 2023Updated 2 years ago
- ☆397Jan 26, 2025Updated last year
- This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring…☆1,249Sep 8, 2025Updated 9 months ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆350Jan 12, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A cookbook with the best practices to working with kubernetes.☆1,502Apr 27, 2026Updated 2 months ago
- A curated list of data engineering tools for software developers☆8,773Jun 22, 2026Updated last week
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆87Jan 22, 2019Updated 7 years ago
- Trident provides an easy way to pass the output of one command to any number of targets.☆34Sep 26, 2021Updated 4 years ago
- Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake developme…☆1,945Aug 26, 2022Updated 3 years ago
- Personal Data Engineering Projects☆1,021Feb 8, 2023Updated 3 years ago
- More than 2000+ Data engineer interview questions.☆1,678Jan 13, 2026Updated 5 months ago
- Fundamentals of Spark with Python (using PySpark), code examples☆365Oct 29, 2022Updated 3 years ago
- Awesome list of data engineering learning materials by subject☆544Jun 9, 2021Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆95Sep 14, 2022Updated 3 years ago
- Always know what to expect from your data.☆11,603Updated this week
- PySpark test helper methods with beautiful error messages☆770May 20, 2026Updated last month
- A curated collection of publicly available resources on dbt best practices and how data-driven organizations around the world utilize dbt☆115Feb 28, 2022Updated 4 years ago
- 📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.☆29,835Jul 18, 2024Updated last year
- Automatic test case generation for python and static analysis library☆268Mar 28, 2022Updated 4 years ago
- A curated list of references for MLOps☆13,947Nov 21, 2024Updated last year