dnguyenngoc / real-time-analytic
This repo gives an introduction to setting up streaming analytics using open source technologies
☆21Updated last year
Related projects: ⓘ
- Nyc_Taxi_Data_Pipeline - DE Project☆62Updated last month
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆27Updated last year
- ☆35Updated 2 months ago
- Spark all the ETL Pipelines☆29Updated last year
- Delta-Lake, ETL, Spark, Airflow☆42Updated last year
- Building a Data Pipeline with an Open Source Stack☆36Updated 2 months ago
- This repository serves as a comprehensive guide to effective data modeling and robust data quality assurance using popular open-source to…☆22Updated last year
- Simple stream processing pipeline☆89Updated 3 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆51Updated last year
- A portable Datamart and Business Intelligence suite built with Docker, Airflow, dbt, PostgreSQL and Superset☆22Updated 2 months ago
- build dw with dbt☆26Updated last month
- A custom end-to-end data pipeline for customer churn☆9Updated this week
- End to end data engineering project☆49Updated last year
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆54Updated last month
- velib-v2___an ETL pipeline that employs batch and streaming jobs using spark, kafka, airflow, and other tools☆17Updated last week
- Data Guy Story commandline☆12Updated last year
- Code for my "Efficient Data Processing in SQL" book.☆47Updated last month
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆22Updated last year
- Open source stack lakehouse☆25Updated 6 months ago
- Building a Modern Data Lake with Minio, Spark, Airflow via Docker.☆13Updated 4 months ago
- Repo for CDC with debezium blog post☆25Updated this week
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆36Updated 11 months ago
- Code for dbt tutorial☆138Updated 3 months ago
- An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…☆177Updated 11 months ago
- Project for "Data pipeline design patterns" blog.☆41Updated last month
- Built a real-time streaming pipeline to extract stock data, using Apache Nifi, Debezium, Kafka, and Spark Streaming. Loaded the transform…☆21Updated 11 months ago
- Data pipeline that scrapes Rust cheater Steam profiles☆50Updated 2 years ago
- This project shows how to capture changes from postgres database and stream them into kafka☆28Updated 4 months ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆33Updated 9 months ago
- Data pipeline for extracting, transforming, and visualising Covid-19 data☆14Updated last year