EthicalML / kafka-spark-streaming-zeppelin-docker
One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)
☆119Updated 3 years ago
Alternatives and similar repositories for kafka-spark-streaming-zeppelin-docker:
Users that are interested in kafka-spark-streaming-zeppelin-docker are comparing it to the libraries listed below
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆97Updated last year
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- spark on kubernetes☆105Updated last year
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆68Updated 3 years ago
- This repository contains code for Spark Streaming☆21Updated 3 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆114Updated this week
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago
- Spark on Kubernetes infrastructure Helm charts repo☆200Updated 2 years ago
- Spark Examples☆125Updated 2 years ago
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- A simple Spark-powered ETL framework that just works 🍺☆178Updated last year
- The Internals of Delta Lake☆183Updated this week
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆126Updated 2 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- ☆47Updated 5 months ago
- DataQuality for BigData☆143Updated last year
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Real-world Spark pipelines examples☆84Updated 6 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆118Updated last month
- Code and presentation for Strata Model Serving tutorial☆68Updated 5 years ago
- Apache Spark Course Material☆86Updated last year
- Flowchart for debugging Spark applications☆104Updated 3 months ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆41Updated last year
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆93Updated this week
- CSD for Apache Airflow☆20Updated 5 years ago
- How to build an awesome data engineering team☆99Updated 5 years ago
- Materials of the Official Helm Chart Webinar☆27Updated 3 years ago
- Pyspark boilerplate for running prod ready data pipeline☆28Updated 3 years ago