jobmthomas / Docker-Bigdata
☆24Updated 3 years ago
Alternatives and similar repositories for Docker-Bigdata:
Users that are interested in Docker-Bigdata are comparing it to the libraries listed below
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆45Updated last year
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆120Updated this week
- Code for docker images☆39Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)☆11Updated 3 years ago
- ☆25Updated 4 years ago
- A simple Spark-powered ETL framework that just works 🍺☆181Updated last month
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated 3 weeks ago
- A simplified, lightweight ETL Framework based on Apache Spark☆585Updated last year
- Spline agent for Apache Spark☆191Updated last week
- Spark Examples☆125Updated 3 years ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆72Updated 3 years ago
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....☆75Updated this week
- Quick Guides from Dremio on Several topics☆69Updated 2 months ago
- Data Engineering with Spark and Delta Lake☆96Updated 2 years ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆53Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Updated 3 years ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆68Updated 4 years ago
- DataQuality for BigData☆144Updated last year
- CSD for Apache Airflow☆20Updated 5 years ago
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆69Updated 2 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆97Updated 2 years ago
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- Spark on Kubernetes infrastructure Helm charts repo☆198Updated 2 years ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆486Updated 2 years ago
- Delta Lake examples☆218Updated 5 months ago
- A Python package to submit and manage Apache Spark applications on Kubernetes.☆41Updated 2 weeks ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago