jobmthomas / Docker-BigdataLinks
☆24Updated 3 years ago
Alternatives and similar repositories for Docker-Bigdata
Users that are interested in Docker-Bigdata are comparing it to the libraries listed below
Sorting:
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆99Updated 2 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- ☆26Updated 4 years ago
- Spark Examples☆125Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- Code for docker images☆39Updated 2 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆124Updated this week
- spark on kubernetes☆104Updated 2 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆46Updated last year
- Delta Lake examples☆225Updated 8 months ago
- ☆80Updated 2 months ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Updated 4 years ago
- Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi☆114Updated last year
- A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino☆21Updated 3 years ago
- Guide for databricks spark certification☆58Updated 4 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆25Updated 5 years ago
- Examples of Spark 3.0☆47Updated 4 years ago
- Spline agent for Apache Spark☆194Updated this week
- Apache Spark Course Material☆91Updated 2 years ago
- Infrastructure automation to deploy Hadoop,Hive,Spark,airflow nodes on a docker host☆20Updated 6 years ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆70Updated 4 years ago
- Spark structured streaming examples with using of version 3.5.1☆26Updated last year
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆75Updated 5 months ago
- PySpark data-pipeline testing and CICD☆28Updated 4 years ago
- A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)☆12Updated 4 years ago
- Code snippets used in demos recorded for the blog.☆37Updated 2 weeks ago
- This repository contains code for Spark Streaming☆22Updated 4 years ago
- ☆23Updated 4 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Updated last year