jobmthomas / Docker-Bigdata
☆24Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for Docker-Bigdata
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Spark Examples☆124Updated 2 years ago
- Spline agent for Apache Spark☆185Updated this week
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆40Updated 11 months ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆32Updated 4 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Apache Spark 3 - Structured Streaming Course Material☆119Updated last year
- Extensible streaming ingestion pipeline on top of Apache Spark☆44Updated 8 months ago
- A simple Spark-powered ETL framework that just works 🍺☆178Updated 11 months ago
- Zeppelin docker☆15Updated 4 years ago
- Spark data pipeline that processes movie ratings data.☆27Updated last week
- Apache Spark Course Material☆85Updated last year
- A Spark Atlas connector to track data lineage in Apache Atlas☆264Updated 2 years ago
- WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging …☆30Updated 4 months ago
- A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino☆16Updated 2 years ago
- A simplified, lightweight ETL Framework based on Apache Spark☆584Updated 9 months ago
- This repository contains code for Spark Streaming☆21Updated 3 years ago
- ☆26Updated 4 years ago
- Multi-container environment with Hadoop, Spark and Hive☆203Updated 10 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆96Updated last year
- Docker with Airflow and Spark standalone cluster☆246Updated last year
- A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)☆11Updated 3 years ago
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆61Updated 5 months ago
- ☆78Updated last year
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆67Updated 3 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- ETL pipeline using pyspark (Spark - Python)☆108Updated 4 years ago
- The official repository for the Rock the JVM Spark Optimization 2 course☆37Updated 11 months ago