dnguyenngoc / real-time-analytic
This repo gives an introduction to setting up streaming analytics using open source technologies
☆22Updated last year
Related projects ⓘ
Alternatives and complementary repositories for real-time-analytic
- Nyc_Taxi_Data_Pipeline - DE Project☆85Updated last month
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆30Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆56Updated last year
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆23Updated 11 months ago
- Simple stream processing pipeline☆92Updated 5 months ago
- This repo contains a spark standalone cluster on docker for anyone who wants to play with PySpark by submitting their applications.☆23Updated last year
- build dw with dbt☆29Updated 3 weeks ago
- ☆36Updated last year
- Delta-Lake, ETL, Spark, Airflow☆44Updated 2 years ago
- ☆38Updated 4 months ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆18Updated 2 months ago
- Projects done in the Data Engineer Nanodegree Program by Udacity.com☆94Updated last year
- Project for "Data pipeline design patterns" blog.☆41Updated 3 months ago
- Building a Data Pipeline with an Open Source Stack☆38Updated 4 months ago
- Data Guy Story commandline☆12Updated last year
- ☆15Updated 9 months ago
- 💜🌈📊 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Pola…☆18Updated this week
- Open source stack lakehouse☆25Updated 8 months ago
- Near real time ETL to populate a dashboard.☆70Updated 5 months ago
- "1 config, 1 command from Jupyter Notebook to serve Millions of users", Full-stack On-Premises MLOps system for Computer Vision from Data…☆42Updated 3 months ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆44Updated last year
- End to end data engineering project☆51Updated 2 years ago
- ☆47Updated 2 months ago
- This project aims to build a streaming application to perform real-time analytics of Covid-19 related tweets and deploy an ML model for r…☆12Updated 3 years ago
- ☆41Updated last year
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆42Updated last year
- Spark all the ETL Pipelines☆32Updated last year
- Code for "Efficient Data Processing in Spark" Course☆245Updated last month
- Crawl data from the TIKI e-commerce, designing a data warehouse, implementing an ETL (Extract, Transform, Load) process, and loading the …☆14Updated last year
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆22Updated last year