The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
☆80Feb 27, 2023Updated 3 years ago
Alternatives and similar repositories for Big-Data-Cluster
Users that are interested in Big-Data-Cluster are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆84Jan 2, 2025Updated last year
- Repository for building docker image, with open-source applications☆26Apr 23, 2024Updated 2 years ago
- Demonstrating practical SQL skills through a curated portfolio of solved problems from top coding platforms.☆52Mar 18, 2026Updated 2 months ago
- Run Hadoop Cluster within Docker Containers.☆16Mar 6, 2025Updated last year
- This project demonstrates real-time data streaming and processing architecture using Kafka, Spark Streaming, and Debezium for capturing C…☆14Oct 24, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆14May 1, 2024Updated 2 years ago
- I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perfo…☆10Oct 20, 2017Updated 8 years ago
- This is the dining reservation system using Google Apps Script.☆14Jan 22, 2024Updated 2 years ago
- ☆23Feb 5, 2024Updated 2 years ago
- Docker Big Data Tools: This docker-compose file is configured to run multiple nodes. This is a Hadoop Cluster that contains the necessary…☆31Jul 6, 2021Updated 4 years ago
- Base hadoop/spark/bigdata image with advanced config loading scripts.☆11Nov 3, 2020Updated 5 years ago
- ☆17Jul 10, 2022Updated 3 years ago
- Delta-Lake, ETL, Spark, Airflow☆49Oct 9, 2022Updated 3 years ago
- Here I will be exploring various tools and methods that are used in data engineering process with Python.☆21Jan 4, 2021Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Blog API with Django Rest Framework.☆13Jan 4, 2023Updated 3 years ago
- Spark application to consume kafka events generated by a python producer.☆12Aug 7, 2021Updated 4 years ago
- This Repo contains Jupyter Notebooks to recap on RDD, DataFrame, Spark Streaming and ML operations using Pyspark☆11Nov 3, 2024Updated last year
- Marshmallow serializer integration with pyspark☆12Dec 29, 2023Updated 2 years ago
- An end-to-end, containerized data pipeline for near-real-time user event analytics using Kafka, ClickHouse, Airflow, and PySpark. Made to…☆78Sep 12, 2025Updated 8 months ago
- ☆12May 27, 2024Updated 2 years ago
- OPC UA simulation server written in Python, which sends out 3 values from a real data set☆70Jan 24, 2023Updated 3 years ago
- Extract, transform, and load data for analytic processing using AWS Glue☆17May 2, 2021Updated 5 years ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆55Sep 30, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆21Mar 11, 2025Updated last year
- Project - Data Processing and Analysis in Python Course☆39Oct 10, 2018Updated 7 years ago
- Data pipeline for extracting, transforming, and visualising Covid-19 data☆14Apr 23, 2023Updated 3 years ago
- A Python PySpark Projet with Poetry☆31May 2, 2026Updated 3 weeks ago
- ☆24Dec 31, 2024Updated last year
- Tutorial for running Django on Azure☆16Feb 7, 2026Updated 3 months ago
- On-premises ELT Pipeline☆32Jul 10, 2025Updated 10 months ago
- Parameter Importance according to OpenML☆14Feb 23, 2022Updated 4 years ago
- A ready to go Big Data cluster (Hadoop + Hadoop Streaming + Spark + PySpark) with Docker and Docker Swarm!☆22May 20, 2025Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- This repo gives an introduction to setting up streaming analytics using open source technologies☆25Mar 2, 2023Updated 3 years ago
- ☆12Jul 27, 2021Updated 4 years ago
- ☆12Jul 22, 2025Updated 10 months ago
- Spark implementation of Slowly Changing Dimension type 2☆11Jan 8, 2019Updated 7 years ago
- A minimal docker compose setup for experimenting with cloud agnostic Lakehouse Architectures Apache Spark with Hive Metastore + Delta Lak…☆34Apr 17, 2024Updated 2 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆45Jan 4, 2024Updated 2 years ago
- A shell script to automate the operations of sqoop☆11Mar 29, 2021Updated 5 years ago