This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
☆104Sep 26, 2025Updated 8 months ago
Alternatives and similar repositories for ApacheSpark
Users that are interested in ApacheSpark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Airflow & DBT Cloud Integrated Project Presented at Lagos DBT Community Meetup & DataFestAfrica 23☆13Oct 11, 2023Updated 2 years ago
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)☆17Dec 18, 2018Updated 7 years ago
- Leetcode SQL Solutions☆194Aug 26, 2023Updated 2 years ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 3 years ago
- ☆24Jul 21, 2022Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- PySpark Cheatsheet☆36Jan 18, 2023Updated 3 years ago
- Ravi Azure ADB ADF Repository☆65Jan 25, 2025Updated last year
- Jupyter Notebook showing how to process Telecom datasets using PySpark (SparkSQL and DataFrames) and plotting the results using Matplotli…☆17Dec 3, 2018Updated 7 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- Collection of Databricks and Jupyter Notebooks☆22Feb 9, 2026Updated 4 months ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 3 years ago
- pyspark dataframe made easy☆16Dec 15, 2021Updated 4 years ago
- Commercetools Python SDK☆17Apr 28, 2026Updated last month
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆497Oct 15, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Dec 15, 2024Updated last year
- Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo htt…☆13Nov 1, 2024Updated last year
- Repository related to Spark SQL and Pyspark using Python3☆42Jun 12, 2022Updated 4 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆57Oct 20, 2022Updated 3 years ago
- Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from…☆35Jan 5, 2023Updated 3 years ago
- This repo contains commands that data engineers use in day to day work.☆61Feb 4, 2023Updated 3 years ago
- Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS☆17Jan 7, 2023Updated 3 years ago
- ☆28Jun 14, 2022Updated 4 years ago
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Databricks Platform - Architecture, Security, Automation and much more!!☆56Jun 4, 2026Updated last week
- A shell script to automate the operations of sqoop☆11Mar 29, 2021Updated 5 years ago
- Apache Spark Interview Question and Answers☆21Oct 13, 2020Updated 5 years ago
- Case Study's from Danny Ma's Serious SQL Course☆19Aug 4, 2022Updated 3 years ago
- Some Windows images for tool images that I had to use in a Windows Environment.☆10Sep 27, 2020Updated 5 years ago
- Resources and projects from Udacity Data Engineering with AWS nano degree programme☆29Apr 12, 2023Updated 3 years ago
- 65 Articles on SQL: A Comprehensive Guide to Mastering Advanced SQL☆11Jun 7, 2023Updated 3 years ago
- simple ETL example☆16Jun 1, 2020Updated 6 years ago
- Public Docker Images for popular services☆53Sep 7, 2025Updated 9 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Contains spark dataframe solutions of leetcode questions☆24Dec 13, 2022Updated 3 years ago
- Python scripts to convert and unpack mainframe EBCDIC data on the cloud or any ASCII environment.☆45Aug 25, 2025Updated 9 months ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago
- ☆16May 27, 2025Updated last year
- An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit☆20Aug 5, 2022Updated 3 years ago
- End to end data engineering project☆59Oct 27, 2022Updated 3 years ago
- A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course I gave to one of our clients in Dece…☆10Feb 3, 2016Updated 10 years ago