mtpatter / time-series-kafka-demo
Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detailed blog post published on Towards Data Science.
☆37Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for time-series-kafka-demo
- A Series of Notebooks on how to start with Kafka and Python☆153Updated last year
- build dw with dbt☆29Updated 2 weeks ago
- A series of Jupyter notebooks that walk you through Machine Learning with Apache Spark ecosystem using Spark MLlib, PyTorch and TensorFlo…☆75Updated last year
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆21Updated 2 years ago
- A Postgres data warehouse for processing synthetic data using IAC principles☆16Updated last year
- Materials of the Official Helm Chart Webinar☆27Updated 3 years ago
- Project for real-time anomaly detection using Kafka and python☆56Updated last year
- Data Engineering with Spark and Delta Lake☆88Updated last year
- Delta-Lake, ETL, Spark, Airflow☆44Updated 2 years ago
- Here I will be exploring various tools and methods that are used in data engineering process with Python.☆22Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- A course by DataTalks Club that covers Spark, Kafka, Docker, Airflow, Terraform, DBT, Big Query etc☆11Updated 2 years ago
- Scaling Machine Learning in Three Week course in a collaboration with O'Reilly following the guidance of Adi Polak's book - Scaling Machi…☆23Updated last year
- ☆32Updated 11 months ago
- Project for "Data pipeline design patterns" blog.☆41Updated 3 months ago
- Python ETL demo for Hackforge☆31Updated last year
- Build an scikit-learn model to predict churn using customer telco data.☆14Updated last year
- ☆37Updated 4 months ago
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆83Updated 6 months ago
- Delta Lake examples☆205Updated last month
- ☆86Updated 2 years ago
- Delta Lake Documentation☆46Updated 4 months ago
- ☆11Updated 2 years ago
- DataTalks Workshop Materials☆18Updated 7 months ago
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- Challenge Data Engineer☆25Updated 2 years ago
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆22Updated last year