This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
☆104Sep 26, 2025Updated 5 months ago
Alternatives and similar repositories for ApacheSpark
Users that are interested in ApacheSpark are comparing it to the libraries listed below
Sorting:
- Airflow & DBT Cloud Integrated Project Presented at Lagos DBT Community Meetup & DataFestAfrica 23☆13Oct 11, 2023Updated 2 years ago
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)☆17Dec 18, 2018Updated 7 years ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 2 years ago
- pyspark dataframe made easy☆16Dec 15, 2021Updated 4 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- PySpark Cheatsheet☆36Jan 18, 2023Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 2 years ago
- Collection of Databricks and Jupyter Notebooks☆22Feb 9, 2026Updated 3 weeks ago
- Apache Spark Interview Question and Answers☆21Oct 13, 2020Updated 5 years ago
- Ravi Azure ADB ADF Repository☆64Jan 25, 2025Updated last year
- This data project can be used as a take-home assignment to learn Pyspark and Data Engineering.☆17Feb 19, 2023Updated 3 years ago
- Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo htt…☆13Nov 1, 2024Updated last year
- Databricks Platform - Architecture, Security, Automation and much more!!☆54Feb 27, 2026Updated last week
- This repo contains commands that data engineers use in day to day work.☆61Feb 4, 2023Updated 3 years ago
- ☆15Jan 17, 2022Updated 4 years ago
- Jupyter Notebook showing how to process Telecom datasets using PySpark (SparkSQL and DataFrames) and plotting the results using Matplotli…☆16Dec 3, 2018Updated 7 years ago
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- Record matching and entity resolution at scale in Spark☆36Oct 31, 2023Updated 2 years ago
- My Study guide used to pass the CRT020 Spark Certification exam☆34Jan 6, 2020Updated 6 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆57Oct 20, 2022Updated 3 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Dec 15, 2024Updated last year
- Commercetools Python SDK☆17Sep 4, 2025Updated 6 months ago
- Repository related to Spark SQL and Pyspark using Python3☆42Jun 12, 2022Updated 3 years ago
- Minimal implementation of Denoised Smoothing (https://arxiv.org/abs/2003.01908) in TensorFlow.☆20Aug 4, 2021Updated 4 years ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago
- This repo is mostly created for pyspark and hive related interview questions.☆63Jan 6, 2026Updated 2 months ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousin…☆15Apr 29, 2021Updated 4 years ago
- An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit☆20Aug 5, 2022Updated 3 years ago
- This repository holds files and scripts for incorporating simple CI/CD practices for model training in ML.☆21Oct 26, 2021Updated 4 years ago
- Case Study's from Danny Ma's Serious SQL Course☆19Aug 4, 2022Updated 3 years ago
- Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS☆17Jan 7, 2023Updated 3 years ago
- Resources and projects from Udacity Data Engineering with AWS nano degree programme☆27Apr 12, 2023Updated 2 years ago
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆23May 14, 2022Updated 3 years ago
- Databricks. Incremental data processing, task orchestration, and production job monitoring.☆39Feb 27, 2024Updated 2 years ago
- Lab environment deployments for the Microsoft data engineering (DP-203) ILT learning content.☆28Jun 29, 2021Updated 4 years ago
- This repository contains code for Spark Streaming☆26Mar 11, 2021Updated 4 years ago
- In this project, we will build and ETL(Extract,Transform,Load) pipeline using the Spotify API on AWS. The pipeline will retrieve data fro…☆25May 6, 2023Updated 2 years ago
- End to end data engineering project☆58Oct 27, 2022Updated 3 years ago
- Guide for databricks spark certification☆59Jun 13, 2021Updated 4 years ago