DucAnhNTT / bigdata-ETL-pipeline
The Data Pipeline and Analytics Stack is a comprehensive solution designed for processing, storing, and visualizing data. Explore a complete data pipeline with all components seamlessly set up and ready to use
☆13Updated last year
Alternatives and similar repositories for bigdata-ETL-pipeline:
Users that are interested in bigdata-ETL-pipeline are comparing it to the libraries listed below
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆26Updated last year
- Cost Efficient Data Pipelines with DuckDB☆52Updated 9 months ago
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- Full stack data engineering tools and infrastructure set-up☆52Updated 4 years ago
- A Postgres data warehouse for processing synthetic data using IAC principles☆17Updated 2 years ago
- build dw with dbt☆44Updated 6 months ago
- End-to-end data platform leveraging the Modern data stack☆47Updated last year
- ☆40Updated 10 months ago
- Code for blog at https://www.startdataengineering.com/post/python-for-de/☆74Updated 11 months ago
- Code for my "Efficient Data Processing in SQL" book.☆56Updated 9 months ago
- This project shows how to capture changes from postgres database and stream them into kafka☆36Updated 11 months ago
- Repo for CDC with debezium blog post☆28Updated 7 months ago
- End to end data engineering project☆54Updated 2 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆35Updated last year
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆27Updated last year
- ☆34Updated last year
- Step by step instructions to create a production-ready data pipeline☆48Updated 4 months ago
- Maternal Health Risk prediction MLOps pipeline☆43Updated 2 years ago
- A pipeline to detect data drift and retrain the model when there is drift☆23Updated last year
- Data engineering project using UK Bus Open Data Service (BODS) to calculate late buses in real-time for any selected region in England. P…☆28Updated 2 years ago
- This repository serves as a comprehensive guide to effective data modeling and robust data quality assurance using popular open-source to…☆30Updated last year
- Building a Data Pipeline with an Open Source Stack☆54Updated 10 months ago
- A demonstration of an ELT (Extract, Load, Transform) pipeline☆29Updated last year
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆11Updated 11 months ago
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆25Updated last year
- ☆16Updated last year
- I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that uti…☆29Updated 2 years ago
- A custom end-to-end analytics platform for customer churn☆11Updated 3 months ago
- A portable Datamart and Business Intelligence suite built with Docker, Airflow, dbt, PostgreSQL and Superset☆41Updated 5 months ago