san089 / Optimizing-Public-Transportation
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
β29Updated last year
Related projects β
Alternatives and complementary repositories for Optimizing-Public-Transportation
- data engineering 100 days π€ 𧲠𦾠| #DEβ37Updated last year
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,β¦β89Updated 2 years ago
- Data engineering interviews Q&A for data community by data communityβ61Updated 4 years ago
- Design/Implement stream/batch architecture on NYC taxi data | #DEβ26Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatioβ¦β53Updated last year
- A way for home buyers to know about factors affecting a stateβ47Updated 5 years ago
- (project & tutorial) dag pipeline tests + ci/cd setupβ85Updated 3 years ago
- RedditR for Content Engagement and Recommendationβ21Updated 6 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract dβ¦β24Updated 2 years ago
- How to build an awesome data engineering teamβ99Updated 5 years ago
- Finance π¦ Data Builder π οΈ @ postgres πβ18Updated 3 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as β¦β16Updated 5 years ago
- A repo to track data engineering projectsβ13Updated 2 years ago
- Simple stream processing pipelineβ92Updated 5 months ago
- PySpark functions and utilities with examples. Assists ETL process of data modelingβ99Updated 3 years ago
- My solutions for the Udacity Data Engineering Nanodegreeβ33Updated 5 years ago
- Classwork projects and home works done through Udacity data engineering nano degreeβ74Updated 11 months ago
- Code for my "Efficient Data Processing in SQL" book.β50Updated 3 months ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflowβ133Updated 4 years ago
- Public source code for the Batch Processing with Apache Beam (Python) online courseβ19Updated 4 years ago
- A repository of sample code to show data quality checking best practices using Airflow.β72Updated last year
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for β¦β132Updated 4 years ago
- This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as stagingβ¦β73Updated 5 years ago
- Data lake, data warehouse on GCPβ54Updated 2 years ago
- Weekly Data Engineering Newsletterβ93Updated 4 months ago
- β86Updated 2 years ago
- Projects done in the Data Engineer Nanodegree Program by Udacity.comβ94Updated last year
- Data Engineering Capstone Project: ETL Pipelines and Data Warehouse Developmentβ21Updated 5 years ago
- Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startupsβ16Updated 6 years ago