juliopez / Hadoop
Infraestructura para Big Data : Hadoop + NiFi +Spark + Hive usando Docker
☆19Updated 2 years ago
Alternatives and similar repositories for Hadoop:
Users that are interested in Hadoop are comparing it to the libraries listed below
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Zeppelin docker☆15Updated 4 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆45Updated last year
- Hadoop, Hive, Parquet and Hue in docker-compose v3☆42Updated 5 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- The demo of using Kafka, Spark, Hive, Cassandra, etc by using Docker. It produces the production ready environment for any kinds of big d…☆32Updated 5 years ago
- ☆47Updated last year
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- A docker using the airflow with Hadoop ecosystem (hive, spark, and sqoop)☆11Updated 3 years ago
- This is a GitHub for all of my NiFi Templates☆46Updated 4 years ago
- The Ultimate Hands-On Hadoop - Tame your Big Data!: https://www.udemy.com/the-ultimate-hands-on-hadoop-tame-your-big-data/☆8Updated 6 years ago
- Infrastructure automation to deploy Hadoop,Hive,Spark,airflow nodes on a docker host☆20Updated 6 years ago
- Big Data Docker Data Science Spark Spark3 Hadoop HDFS Scala Python Artificial Intelligence Machine Learning Jupyter Lab Notebook☆16Updated last week
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 5 years ago
- Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,…☆28Updated 2 months ago
- Set up a 3 node spark cluster using docker containers☆34Updated 7 years ago
- Docker Big Data Tools: This docker-compose file is configured to run multiple nodes. This is a Hadoop Cluster that contains the necessary…☆28Updated 3 years ago
- End-to-end Kafka Streaming Examples on Databricks with Evolving Avro Schemas.☆9Updated last year
- Building Big Data Pipelines with Apache Beam, published by Packt☆86Updated last year
- Code Repository for AWS Certified Big Data Specialty 2019 - In Depth and Hands On!, published by Packt☆39Updated last year
- ☆24Updated 3 years ago
- Materials for the next course☆24Updated 2 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago
- Micro Data Lake based on Docker Compose☆17Updated 4 years ago
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆69Updated 2 months ago
- ☆25Updated 4 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Updated last year
- Docker compose and Google Colab demo to build a CDC with Delta Lake☆15Updated 2 years ago
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)☆120Updated 3 years ago