This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
☆104Sep 26, 2025Updated 6 months ago
Alternatives and similar repositories for ApacheSpark
Users that are interested in ApacheSpark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Airflow & DBT Cloud Integrated Project Presented at Lagos DBT Community Meetup & DataFestAfrica 23☆13Oct 11, 2023Updated 2 years ago
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)☆17Dec 18, 2018Updated 7 years ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 2 years ago
- ☆15Jan 17, 2022Updated 4 years ago
- ☆24Jul 21, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- PySpark Cheatsheet☆36Jan 18, 2023Updated 3 years ago
- Ravi Azure ADB ADF Repository☆65Jan 25, 2025Updated last year
- Jupyter Notebook showing how to process Telecom datasets using PySpark (SparkSQL and DataFrames) and plotting the results using Matplotli…☆17Dec 3, 2018Updated 7 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- Collection of Databricks and Jupyter Notebooks☆22Feb 9, 2026Updated last month
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 2 years ago
- pyspark dataframe made easy☆16Dec 15, 2021Updated 4 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆490Oct 15, 2024Updated last year
- Code Repository for my 3rd Data Project.☆16Jun 13, 2023Updated 2 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Repository related to Spark SQL and Pyspark using Python3☆42Jun 12, 2022Updated 3 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆57Oct 20, 2022Updated 3 years ago
- This repo contains commands that data engineers use in day to day work.☆62Feb 4, 2023Updated 3 years ago
- Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS☆17Jan 7, 2023Updated 3 years ago
- ☆27Jun 14, 2022Updated 3 years ago
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- Databricks Platform - Architecture, Security, Automation and much more!!☆55Updated this week
- Apache Spark Interview Question and Answers☆21Oct 13, 2020Updated 5 years ago
- Case Study's from Danny Ma's Serious SQL Course☆19Aug 4, 2022Updated 3 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Resources and projects from Udacity Data Engineering with AWS nano degree programme☆28Apr 12, 2023Updated 2 years ago
- I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perfo…☆10Oct 20, 2017Updated 8 years ago
- 65 Articles on SQL: A Comprehensive Guide to Mastering Advanced SQL☆11Jun 7, 2023Updated 2 years ago
- dbt sample project for Snowflake using the `TPCH` dataset that ships as a shared database with Snowflake.☆21Apr 5, 2022Updated 3 years ago
- Public Docker Images for popular services☆50Sep 7, 2025Updated 6 months ago
- This data project can be used as a take-home assignment to learn Pyspark and Data Engineering.☆17Feb 19, 2023Updated 3 years ago
- Contains spark dataframe solutions of leetcode questions☆24Dec 13, 2022Updated 3 years ago
- Python scripts to convert and unpack mainframe EBCDIC data on the cloud or any ASCII environment.☆43Aug 25, 2025Updated 7 months ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- End to end data engineering project☆58Oct 27, 2022Updated 3 years ago
- This project implements a Lakehouse Medallion Architecture using modern Data Stack tools such as Fivetran, Snowflake and dbt. The fictici…☆14Sep 30, 2024Updated last year
- A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course I gave to one of our clients in Dece…☆10Feb 3, 2016Updated 10 years ago
- The 6 most window functions in PySpark - based on my blog post☆12Dec 15, 2023Updated 2 years ago
- In this project, we will build and ETL(Extract,Transform,Load) pipeline using the Spotify API on AWS. The pipeline will retrieve data fro…☆25May 6, 2023Updated 2 years ago
- Complete SQL Project for data analysis with source code.☆371Oct 11, 2022Updated 3 years ago
- PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2☆88Jan 3, 2020Updated 6 years ago