This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
☆104Sep 26, 2025Updated 8 months ago
Alternatives and similar repositories for ApacheSpark
Users that are interested in ApacheSpark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Airflow & DBT Cloud Integrated Project Presented at Lagos DBT Community Meetup & DataFestAfrica 23☆13Oct 11, 2023Updated 2 years ago
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)☆17Dec 18, 2018Updated 7 years ago
- Leetcode SQL Solutions☆194Aug 26, 2023Updated 2 years ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 3 years ago
- ☆15Jan 17, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆24Jul 21, 2022Updated 3 years ago
- PySpark Cheatsheet☆36Jan 18, 2023Updated 3 years ago
- Ravi Azure ADB ADF Repository☆65Jan 25, 2025Updated last year
- Jupyter Notebook showing how to process Telecom datasets using PySpark (SparkSQL and DataFrames) and plotting the results using Matplotli…☆17Dec 3, 2018Updated 7 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- Collection of Databricks and Jupyter Notebooks☆22Feb 9, 2026Updated 3 months ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 3 years ago
- pyspark dataframe made easy☆16Dec 15, 2021Updated 4 years ago
- Commercetools Python SDK☆17Apr 28, 2026Updated 3 weeks ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This repository contains code for Spark Streaming☆26Mar 11, 2021Updated 5 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆493Oct 15, 2024Updated last year
- Code Repository for my 3rd Data Project.☆16Jun 13, 2023Updated 2 years ago
- Repository related to Spark SQL and Pyspark using Python3☆42Jun 12, 2022Updated 3 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆57Oct 20, 2022Updated 3 years ago
- Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from…☆35Jan 5, 2023Updated 3 years ago
- This repo contains commands that data engineers use in day to day work.☆61Feb 4, 2023Updated 3 years ago
- Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS☆17Jan 7, 2023Updated 3 years ago
- ☆28Jun 14, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Databricks Platform - Architecture, Security, Automation and much more!!☆56Apr 7, 2026Updated last month
- Template for Scala Spark with Unit Test☆13Jul 24, 2023Updated 2 years ago
- Databricks. Incremental data processing, task orchestration, and production job monitoring.☆45Feb 27, 2024Updated 2 years ago
- A shell script to automate the operations of sqoop☆11Mar 29, 2021Updated 5 years ago
- Apache Spark Interview Question and Answers☆21Oct 13, 2020Updated 5 years ago
- Case Study's from Danny Ma's Serious SQL Course☆19Aug 4, 2022Updated 3 years ago
- Some Windows images for tool images that I had to use in a Windows Environment.☆10Sep 27, 2020Updated 5 years ago
- Resources and projects from Udacity Data Engineering with AWS nano degree programme☆29Apr 12, 2023Updated 3 years ago
- 65 Articles on SQL: A Comprehensive Guide to Mastering Advanced SQL☆11Jun 7, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This data project can be used as a take-home assignment to learn Pyspark and Data Engineering.☆18Feb 19, 2023Updated 3 years ago
- Public Docker Images for popular services☆53Sep 7, 2025Updated 8 months ago
- Contains spark dataframe solutions of leetcode questions☆24Dec 13, 2022Updated 3 years ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago
- End to end data engineering project☆59Oct 27, 2022Updated 3 years ago
- A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course I gave to one of our clients in Dece…☆10Feb 3, 2016Updated 10 years ago
- The 6 most window functions in PySpark - based on my blog post☆12Dec 15, 2023Updated 2 years ago