longNguyen010203 / Youtube-ETL-Pipeline
πππ A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api πΊ
β13Updated 3 months ago
Related projects: β
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), sparβ¦β27Updated last year
- β29Updated 7 months ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgrβ¦β24Updated 9 months ago
- Data pipeline that scrapes Rust cheater Steam profilesβ50Updated 2 years ago
- A data engineering project with Airflow, dbt, Terrafrom, GCP and much more!β21Updated last year
- Resources and projects from Udacity Data Engineering with AWS nano degree programmeβ22Updated last year
- DataTalks.Club's Data Engineering Zoomcamp Projectβ19Updated 2 years ago
- Nyc_Taxi_Data_Pipeline - DE Projectβ62Updated last month
- velib-v2___an ETL pipeline that employs batch and streaming jobs using spark, kafka, airflow, and other toolsβ17Updated last week
- End to end data engineering projectβ49Updated last year
- Get Crypto data from API, stream it to Kafka with Airflow. Write data to MySQL and visualize with Metabaseβ13Updated 11 months ago
- This project shows how to capture changes from postgres database and stream them into kafkaβ28Updated 4 months ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessarβ¦β33Updated 9 months ago
- Code Repository for my 3rd Data Project.β12Updated last year
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apaβ¦β19Updated last year
- Data Engineering Project to Extract and Process Solana Reddit Dataβ18Updated 7 months ago
- A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousinβ¦β12Updated 3 years ago
- A portable Datamart and Business Intelligence suite built with Docker, Airflow, dbt, PostgreSQL and Supersetβ22Updated 2 months ago
- My personal project for data engineering zoomcampβ12Updated this week
- Code for my "Efficient Data Processing in SQL" book.β47Updated last month
- Code for blog at https://www.startdataengineering.com/post/python-for-de/β47Updated 3 months ago
- Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streβ¦β50Updated 3 weeks ago
- capstone project for Dataengineer.io bootcamp Public Repoβ11Updated 7 months ago
- β35Updated 2 months ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAβ¦β25Updated 8 months ago
- My first attempt at a rough ETL pipeline; technologies include spark, GCS, prefect orchestration, and terraformβ14Updated last year
- β20Updated 3 months ago
- Data Pipeline from the Global Historical Climatology Network DataSetβ24Updated last year
- This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering enβ¦β13Updated 7 months ago
- A project portfolio to accompany my resumeβ21Updated last year