This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
☆104Sep 26, 2025Updated 6 months ago
Alternatives and similar repositories for ApacheSpark
Users that are interested in ApacheSpark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Airflow & DBT Cloud Integrated Project Presented at Lagos DBT Community Meetup & DataFestAfrica 23☆13Oct 11, 2023Updated 2 years ago
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)☆17Dec 18, 2018Updated 7 years ago
- Leetcode SQL Solutions☆191Aug 26, 2023Updated 2 years ago
- ☆15Jan 17, 2022Updated 4 years ago
- PySpark Cheatsheet☆36Jan 18, 2023Updated 3 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Ravi Azure ADB ADF Repository☆65Jan 25, 2025Updated last year
- Jupyter Notebook showing how to process Telecom datasets using PySpark (SparkSQL and DataFrames) and plotting the results using Matplotli…☆17Dec 3, 2018Updated 7 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guide☆17Sep 13, 2020Updated 5 years ago
- Collection of Databricks and Jupyter Notebooks☆22Feb 9, 2026Updated 2 months ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 2 years ago
- Commercetools Python SDK☆17Sep 4, 2025Updated 7 months ago
- This repository contains code for Spark Streaming☆26Mar 11, 2021Updated 5 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆491Oct 15, 2024Updated last year
- Code Repository for my 3rd Data Project.☆16Jun 13, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Repository related to Spark SQL and Pyspark using Python3☆42Jun 12, 2022Updated 3 years ago
- Solution to all projects of Udacity's Data Engineering Nanodegree: Data Modeling with Postgres & Cassandra, Data Warehouse with Redshift,…☆57Oct 20, 2022Updated 3 years ago
- This repo contains commands that data engineers use in day to day work.☆62Feb 4, 2023Updated 3 years ago
- Databricks. Incremental data processing, task orchestration, and production job monitoring.☆44Feb 27, 2024Updated 2 years ago
- Repository for Microsoft Databricks Training Events - Hosted by BlueGranite☆15Aug 22, 2019Updated 6 years ago
- Databricks Platform - Architecture, Security, Automation and much more!!☆56Apr 7, 2026Updated last week
- A shell script to automate the operations of sqoop☆11Mar 29, 2021Updated 5 years ago
- This repo is mostly created for pyspark and hive related interview questions.☆63Jan 6, 2026Updated 3 months ago
- Some Windows images for tool images that I had to use in a Windows Environment.