This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
☆104Sep 26, 2025Updated 7 months ago
Alternatives and similar repositories for ApacheSpark
Users that are interested in ApacheSpark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Airflow & DBT Cloud Integrated Project Presented at Lagos DBT Community Meetup & DataFestAfrica 23☆13Oct 11, 2023Updated 2 years ago
- Leetcode SQL Solutions☆191Aug 26, 2023Updated 2 years ago
- Simple ETL pipeline using Python☆29May 22, 2023Updated 2 years ago
- ☆24Jul 21, 2022Updated 3 years ago
- PySpark Cheatsheet☆36Jan 18, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Ravi Azure ADB ADF Repository☆65Jan 25, 2025Updated last year
- Jupyter Notebook showing how to process Telecom datasets using PySpark (SparkSQL and DataFrames) and plotting the results using Matplotli…☆17Dec 3, 2018Updated 7 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 3 years ago
- pyspark dataframe made easy☆16Dec 15, 2021Updated 4 years ago
- This repository contains code for Spark Streaming☆26Mar 11, 2021Updated 5 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆493Oct 15, 2024Updated last year
- Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo htt…☆13Nov 1, 2024Updated last year
- Code Repository for my 3rd Data Project.☆16Jun 13, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Repository related to Spark SQL and Pyspark using Python3☆42Jun 12, 2022Updated 3 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆57Oct 20, 2022Updated 3 years ago
- Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from…☆35Jan 5, 2023Updated 3 years ago
- This repo contains commands that data engineers use in day to day work.☆62Feb 4, 2023Updated 3 years ago
- ☆28Jun 14, 2022Updated 3 years ago
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- A shell script to automate the operations of sqoop☆11Mar 29, 2021Updated 5 years ago
- Apache Spark Interview Question and Answers☆21Oct 13, 2020Updated 5 years ago
- This repo is mostly created for pyspark and hive related interview questions.☆63Jan 6, 2026Updated 4 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Case Study's from Danny Ma's Serious SQL Course☆19Aug 4, 2022Updated 3 years ago
- Some Windows images for tool images that I had to use in a Windows Environment.☆10Sep 27, 2020Updated 5 years ago
- Resources and projects from Udacity Data Engineering with AWS nano degree programme☆29Apr 12, 2023Updated 3 years ago
- 65 Articles on SQL: A Comprehensive Guide to Mastering Advanced SQL☆11Jun 7, 2023Updated 2 years ago
- dbt sample project for Snowflake using the `TPCH` dataset that ships as a shared database with Snowflake.☆21Apr 5, 2022Updated 4 years ago
- simple ETL example☆16Jun 1, 2020Updated 5 years ago
- This data project can be used as a take-home assignment to learn Pyspark and Data Engineering.☆18Feb 19, 2023Updated 3 years ago
- Public Docker Images for popular services☆53Sep 7, 2025Updated 7 months ago
- Contains spark dataframe solutions of leetcode questions☆24Dec 13, 2022Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Python scripts to convert and unpack mainframe EBCDIC data on the cloud or any ASCII environment.☆44Aug 25, 2025Updated 8 months ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago
- An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit☆20Aug 5, 2022Updated 3 years ago
- End to end data engineering project☆58Oct 27, 2022Updated 3 years ago
- The 6 most window functions in PySpark - based on my blog post☆12Dec 15, 2023Updated 2 years ago
- In this project, we will build and ETL(Extract,Transform,Load) pipeline using the Spotify API on AWS. The pipeline will retrieve data fro…☆25May 6, 2023Updated 3 years ago
- adidas Data Mesh implementation☆12May 13, 2022Updated 3 years ago