abeltavares/real-time-data-pipeline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/abeltavares/real-time-data-pipeline)

abeltavares / real-time-data-pipeline

📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.

☆78

Alternatives and similar repositories for real-time-data-pipeline

Users that are interested in real-time-data-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gordonmurray / apache_flink_and_iceberg
View on GitHub
A complete data engineering project demonstrating modern data stack practices with Apache Flink, Iceberg, Trino and Superset
☆26Sep 29, 2025Updated 10 months ago
Armaan1Gohil / dataengineering-tech-stack
View on GitHub
Local Environment to Practice Data Engineering
☆142Dec 30, 2024Updated last year
dogukannulu / aws_end_to_end_streaming_pipeline
View on GitHub
An AWS Data Engineering End-to-End Project (Glue, Lambda, Kinesis, Redshift, QuickSight, Athena, EC2, S3)
☆17Sep 20, 2023Updated 2 years ago
DataSQRL / flink-sql-runner
View on GitHub
Dockerized runner, utilities, and functions for FlinkSQL applications
☆31Updated this week
meetapandit / nyc-citibike-data-pipeline
View on GitHub
☆12Jul 8, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
harrydevforlife / building-lakehouse
View on GitHub
Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…
☆43Dec 15, 2025Updated 7 months ago
iamraphson / DE-2024-project-book-recommendation
View on GitHub
☆21Mar 31, 2024Updated 2 years ago
vkondepati / gis-map-viewer
View on GitHub
A lightweight, interactive GIS + Graph-based map viewer for spatial exploration and supply chain intelligence.
☆24Jul 18, 2026Updated last week
tj--- / iceberg-demo
View on GitHub
A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino
☆22May 30, 2022Updated 4 years ago
alpinegizmo / flink-mobile-data-usage
View on GitHub
☆45Mar 25, 2022Updated 4 years ago
veb-101 / Machine-Learning-Algorithms
View on GitHub
One notebook to learn it all - Algorithms from scratch
☆15May 26, 2020Updated 6 years ago
var1914 / mlops-boilerplate
View on GitHub
☆25May 17, 2026Updated 2 months ago
hksahil / DevOps-For-BigData-MASTERCODES
View on GitHub
DevOps For BigData With CI/CD
☆21Aug 17, 2024Updated last year
fraibacas / lakehouse-poc
View on GitHub
Run an open-source data LakeHouse locally using Docker Compose
☆12May 31, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
abeltavares / batch-data-pipeline
View on GitHub
🦆 Batch data pipeline with Airflow, DuckDB, Delta Lake, Trino, MinIO, and Metabase. Full observability and data quality.
☆90Nov 5, 2025Updated 8 months ago
jess197 / football_statistics_etl_project
View on GitHub
☆13Dec 28, 2023Updated 2 years ago
selfscrum / terraform-aws-dagster-ecs
View on GitHub
Create an ECS environment for a distributed Dagster installation
☆12Feb 20, 2024Updated 2 years ago
augustodn / pyflink-docker
View on GitHub
Simple project using pyflink, kafka and postgre containerized using Docker
☆11Aug 26, 2024Updated last year
faizpuad / DataEngineeringProject-AWSRealtimeCreditCardTrxPipeline
View on GitHub
Real-time OLTP system for credit card fraud detection using AWS API Gateway, Kinesis, and RDS PostgreSQL. Features a scalable, serverless…
☆25Dec 16, 2024Updated last year
K8sAcademy / GoogleCloud-HandsOn
View on GitHub
Files for the Docker and Kubernetes on Google Cloud Hands-On labs
☆11Mar 14, 2023Updated 3 years ago
japerry911 / crypto-data-pipeline
View on GitHub
Data Pipeline that utilizes GCP, Python 3.10, Prefect, and more.
☆10Jan 23, 2023Updated 3 years ago
richban / opendata-stack-platform
View on GitHub
Open Data Stack Platform: a collection of projects and pipelines built with open data stack tools for scalable, observable data platform…
☆22Updated this week
faizeraza / dataengineering-github-data-pipelineline
View on GitHub
In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…
☆12Sep 9, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
sanyassyed / DataEngineering_SFO_Eviction_Data_ETL_Pipeline
View on GitHub
Data Engineering & Analysis Project- San Francisco Eviction Data ETL Pipeline An end-to-end batch data pipeline for performing ETL on San…
☆10Oct 2, 2025Updated 9 months ago
irfansofyana / my-data-engineering-learning-path
View on GitHub
This repository is about my journey to learn about data engineering
☆10Dec 30, 2020Updated 5 years ago
Rothamsted / knetbuilder
View on GitHub
KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex.
☆16Apr 2, 2026Updated 3 months ago
aws-samples / apache-xtable-on-aws-samples
View on GitHub
☆11Updated this week
tatwan / airflow-spark-aws-emr
View on GitHub
☆12Mar 6, 2021Updated 5 years ago
aws-samples / streaming-data-lake-flink-cdc-apache-hudi
View on GitHub
☆11Oct 19, 2023Updated 2 years ago
taylor-ortiz / dataexpert-data-engineering-capstone
View on GitHub
☆17May 23, 2025Updated last year
mehd-io / dbt-duckdb-tutorial
View on GitHub
This is a simple analytic project using DuckDB & dbt with air quality data.
☆24Feb 21, 2024Updated 2 years ago
aws-samples / data-engineering-on-aws
View on GitHub
☆22Oct 21, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
madyark / heart-rate-stream
View on GitHub
An end-to-end ELT pipeline to store simulated heart rate data inside a data warehouse; uses Kafka for real-time processing, Airbyte for d…
☆15May 28, 2024Updated 2 years ago
avirup88 / fintech-fraud_detection-data-pipeline
View on GitHub
☆13Apr 8, 2025Updated last year
hieuimba / Lichess-Spark-DataPipeline
View on GitHub
Spark-based pipeline to extract and parse monthly games from the Lichess database.
☆22Sep 22, 2025Updated 10 months ago
nil1729 / trino-jmx-monitoring
View on GitHub
trino monitoring with JMX metrics through Prometheus and Grafana
☆17Aug 14, 2024Updated last year
pmoskovi / flink-learning-resources
View on GitHub
A curated list of Apache Flink learning resources
☆150Jan 6, 2025Updated last year
knaufk / advent-of-flink-2024
View on GitHub
One bite-sized tip or trick for Apache Flink practitioners every day leading up to Christmas Eve 2024.
☆29Dec 21, 2024Updated last year
NetEase / spark-alarm
View on GitHub
Alerting and monitoring tool for Apache Spark
☆23May 20, 2022Updated 4 years ago