Practice your Pyspark skills!
☆106Oct 22, 2021Updated 4 years ago
Alternatives and similar repositories for pyspark_exercises
Users that are interested in pyspark_exercises are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code snippets and tutorials for working with social science data in PySpark☆416Aug 11, 2017Updated 8 years ago
- StreamSoft enables real-time analysis of any stock market☆15Apr 24, 2024Updated 2 years ago
- Demo of using Airflow☆11Jun 24, 2022Updated 3 years ago
- This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed t…☆51Feb 11, 2025Updated last year
- ☆94Dec 17, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Top Big Tech Data Science Questions☆19Sep 22, 2022Updated 3 years ago
- This is a template you can use for your next data engineering portfolio project.☆191Sep 10, 2021Updated 4 years ago
- universal-datalakehouse-postgres-ingestion-deltastreamer☆11Apr 7, 2024Updated 2 years ago
- A blog app written in Flask☆15Jan 3, 2015Updated 11 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆104Dec 3, 2020Updated 5 years ago
- In this project I used apache airflow to scrape website periodically. This is for the tutorials I do on youtube.☆10Nov 21, 2022Updated 3 years ago
- The goal of this project is to analyse the impact of Covid-19 on the Aviation industry through data engineering processes using technolog…☆13Jun 26, 2022Updated 3 years ago
- ☆10Apr 10, 2019Updated 7 years ago
- All descriptive information and shared materials from any of the AICG sponsorred Meetups or Weinbards☆17Feb 28, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Get started setting up infrastructure as code on Google Cloud Platform☆11Jun 13, 2021Updated 4 years ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆111Jan 8, 2026Updated 4 months ago
- Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake☆281Jun 27, 2025Updated 10 months ago
- ☆14Jan 22, 2019Updated 7 years ago
- bash script to find and execute java classes with main methods☆20Oct 24, 2025Updated 6 months ago
- This is the first project where we worked on apache spark, In this project what we have done is that we downloaded the datasets from KAGG…☆23Oct 14, 2021Updated 4 years ago
- Self-improving AI agents using Agentic Context Engineering - A starter implementation with Google ADK☆21Oct 23, 2025Updated 6 months ago
- Fast monkey blog☆10Apr 1, 2018Updated 8 years ago
- Solved data engineering exercises using Pyspark☆16Aug 2, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆24Apr 2, 2022Updated 4 years ago
- This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)☆11Apr 29, 2022Updated 4 years ago
- ☆18Apr 20, 2018Updated 8 years ago
- This repo contains "Azure Data Engineer Associate" Questions and related docs.☆13Jan 29, 2024Updated 2 years ago
- Generic exploit for master key vulnerability in Android☆34Feb 6, 2015Updated 11 years ago
- ☆12Jun 9, 2025Updated 11 months ago
- Examples for High Performance Spark☆16Oct 25, 2025Updated 6 months ago
- Multi-Agent AI Application(Python) that uses Semantic-Kernel along with Azure AI Agent Service in Azure Ai Foundry☆15Mar 6, 2025Updated last year
- My first attempt at a rough ETL pipeline; technologies include spark, GCS, prefect orchestration, and terraform☆14Oct 12, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆25Apr 20, 2026Updated 2 weeks ago
- This repository contains my solutions to the top 50 LeetCode SQL challenges implemented using PySpark DataFrame and PySpark SQL.☆29Mar 16, 2024Updated 2 years ago
- Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Aug 26, 2023Updated 2 years ago
- ☆28Aug 29, 2022Updated 3 years ago
- Making the transition from Scratch to Python☆11Apr 11, 2017Updated 9 years ago
- Resources and projects from Udacity Data Engineering with AWS nano degree programme☆29Apr 12, 2023Updated 3 years ago
- Livy Manager - Web UI for Managing Apache Livy Sessions☆16Dec 7, 2017Updated 8 years ago