ahmed-gharib89 / flights-delta-lake
☆9Updated 2 years ago
Alternatives and similar repositories for flights-delta-lake:
Users that are interested in flights-delta-lake are comparing it to the libraries listed below
- A curated list of awesome Databricks resources, including Spark☆17Updated 10 months ago
- A modern ELT demo using airbyte, dbt, snowflake and dagster☆27Updated 2 years ago
- Full stack data engineering tools and infrastructure set-up☆51Updated 4 years ago
- Data engineering interviews Q&A for data community by data community☆63Updated 4 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆54Updated last year
- ☆22Updated 2 years ago
- A repository of sample code to show data quality checking best practices using Airflow.☆76Updated 2 years ago
- pyspark dataframe made easy☆16Updated 3 years ago
- Resources for video demonstrations and blog posts related to DataOps on AWS☆175Updated 3 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- Pyspark boilerplate for running prod ready data pipeline☆28Updated 4 years ago
- This repo contains live examples to build Databricks' Lakehouse and recommended best practices from the field.☆18Updated 6 months ago
- Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database☆13Updated 3 years ago
- Road to Azure Data Engineer Part-II: DP-201 - Designing an Azure Data Solution☆19Updated 4 years ago
- Delta Lake Documentation☆49Updated 10 months ago
- This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…☆98Updated 8 months ago
- data engineering 100 days 🤖 🧲 🦾 | #DE☆40Updated last year
- Glue VSCode devcontainer setup☆14Updated 2 years ago
- A Pyspark job to handle upserts, conversion to parquet and create partitions on S3☆26Updated 4 years ago
- The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on…☆27Updated 2 years ago
- Spark data pipeline that processes movie ratings data.☆28Updated 3 weeks ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆90Updated 3 years ago
- Code for my "Efficient Data Processing in SQL" book.☆56Updated 8 months ago
- how to unit test your PySpark code☆28Updated 4 years ago
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Updated 4 years ago
- ☆87Updated 2 years ago
- A Python PySpark Projet with Poetry☆23Updated 7 months ago
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆38Updated 9 months ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆136Updated 5 years ago
- This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and …☆29Updated last year