This repo contains commands that data engineers use in day to day work.
☆62Feb 4, 2023Updated 3 years ago
Alternatives and similar repositories for TowardsDataEngineering
Users that are interested in TowardsDataEngineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PySpark Cheatsheet☆36Jan 18, 2023Updated 3 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆492Oct 15, 2024Updated last year
- This repo is mostly created for pyspark and hive related interview questions.☆63Jan 6, 2026Updated 3 months ago
- This data project can be used as a take-home assignment to learn Pyspark and Data Engineering.☆18Feb 19, 2023Updated 3 years ago
- Data Engineering, Data Warehouse, Data Mart, Cloud Data, AWS, SAS, Redshift, S3☆32Feb 2, 2021Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Serious SQL is a Data With Danny virtual data apprenticeship program.☆22Sep 3, 2021Updated 4 years ago
- Big Data for Data Engineers Coursera Specialization from Yandex☆100Mar 15, 2023Updated 3 years ago
- This is an all-in-one repository for Data Engineers, ideal for beginners & interview preparation, which includes Python as the main Progr…☆32Mar 21, 2023Updated 3 years ago
- ☆95Sep 14, 2022Updated 3 years ago
- Data engineering interviews Q&A for data community by data community☆66Jun 7, 2020Updated 5 years ago
- ☆27Feb 2, 2018Updated 8 years ago
- ☆18Nov 9, 2025Updated 5 months ago
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆91Jul 17, 2019Updated 6 years ago
- Case Study's from Danny Ma's Serious SQL Course☆19Aug 4, 2022Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Data Engineering Bootcamp 2021☆13Aug 8, 2023Updated 2 years ago
- A data engineering project with Airflow, dbt, Terrafrom, GCP and much more!☆26Nov 8, 2022Updated 3 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆52Aug 23, 2019Updated 6 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Jan 22, 2024Updated 2 years ago
- This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…☆104Sep 26, 2025Updated 7 months ago
- Python ETL demo for Hackforge☆32Oct 11, 2023Updated 2 years ago
- Data Science Learning Notes☆11Oct 18, 2023Updated 2 years ago
- ☆15Feb 4, 2023Updated 3 years ago
- Personal Repository of Data Science Projects☆14May 8, 2019Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- For this project I am creating an ETL (Extract, Transform, and Load) pipeline using Python, RegEx, and SQL Database. The goal is to retri…☆26Feb 9, 2021Updated 5 years ago
- ☆12Jan 20, 2023Updated 3 years ago
- A tool to validate data, built around Apache Spark.☆102Feb 19, 2026Updated 2 months ago
- ☆10Nov 28, 2022Updated 3 years ago
- ( These solutions tested on 4 node Hortonwork cluster on my laptop. Do not test on your production environment until you test... :)☆20Apr 18, 2020Updated 6 years ago
- Example end to end data engineering project.☆1,412Dec 8, 2022Updated 3 years ago
- Scala data validation library☆30Aug 14, 2016Updated 9 years ago
- Personal Data Engineering Projects☆1,011Feb 8, 2023Updated 3 years ago
- ☆11Jul 13, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- This repo contains all the code used in the Python for Data Engineering Course☆361Apr 24, 2024Updated 2 years ago
- Simple utility to analyze Github public profile.☆13Oct 3, 2020Updated 5 years ago
- Apache Airflow advanced functionalities examples☆21Mar 22, 2024Updated 2 years ago
- Udacity Data Engineering Nanodegree Capstone Project☆36May 9, 2020Updated 5 years ago
- ☆13Oct 15, 2021Updated 4 years ago
- Collection of Databricks and Jupyter Notebooks☆22Feb 9, 2026Updated 2 months ago
- ☆16Jan 8, 2023Updated 3 years ago