tech4242 / docker-hadoop-hive-parquetLinks
Hadoop, Hive, Parquet and Hue in docker-compose v3
☆42Updated 5 years ago
Alternatives and similar repositories for docker-hadoop-hive-parquet
Users that are interested in docker-hadoop-hive-parquet are comparing it to the libraries listed below
Sorting:
- Real-world Spark pipelines examples☆83Updated 7 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- Infrastructure automation to deploy Hadoop,Hive,Spark,airflow nodes on a docker host☆20Updated 6 years ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆70Updated 4 years ago
- Examples of Spark 3.0☆47Updated 4 years ago
- DataQuality for BigData☆144Updated last year
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 6 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 6 years ago
- ITSumma Spark Greenplum Connector☆38Updated last year
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆124Updated this week
- A docker image with a pre-configured Hive Metastore and a Spark ThriftServer☆19Updated 5 years ago
- ☆24Updated 4 years ago
- spark on kubernetes☆104Updated 2 years ago
- Delta Lake Examples☆12Updated 5 years ago
- Docker image for Apache Hive Metastore☆71Updated 2 years ago
- ☆32Updated 7 years ago
- Spark implementation of Slowly Changing Dimension type 2☆11Updated 6 years ago
- Postgresql configured to work as metastore for Hive.☆32Updated 2 years ago
- Docker multi-nodes Hadoop cluster with Spark 2.4.1 on Yarn☆51Updated 4 years ago
- The official repository for the Rock the JVM Spark Optimization 2 course☆40Updated last year
- Code for docker images☆39Updated 2 years ago
- Yet Another (Spark) ETL Framework☆21Updated last year
- ☆75Updated 5 years ago
- Docker Big Data Tools: This docker-compose file is configured to run multiple nodes. This is a Hadoop Cluster that contains the necessary…☆30Updated 3 years ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆99Updated 2 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆95Updated last week
- PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2☆86Updated 5 years ago
- Deploy your Spark Production Cluster on Kubernetes☆47Updated 4 years ago
- Presto Trino with Apache Hive Postgres metastore☆42Updated 9 months ago