The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
☆76Feb 27, 2023Updated 3 years ago
Alternatives and similar repositories for Big-Data-Cluster
Users that are interested in Big-Data-Cluster are comparing it to the libraries listed below
Sorting:
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆84Jan 2, 2025Updated last year
- This project demonstrates real-time data streaming and processing architecture using Kafka, Spark Streaming, and Debezium for capturing C…☆13Oct 24, 2024Updated last year
- Run Hadoop Cluster within Docker Containers.☆16Mar 6, 2025Updated last year
- Demonstrating practical SQL skills through a curated portfolio of solved problems from top coding platforms.☆44Dec 31, 2025Updated 2 months ago
- ☆139Dec 27, 2024Updated last year
- Docker Big Data Tools: This docker-compose file is configured to run multiple nodes. This is a Hadoop Cluster that contains the necessary…☆31Jul 6, 2021Updated 4 years ago
- Big Data Ecosystem Docker☆426Apr 29, 2023Updated 2 years ago
- Public Docker Images for popular services☆49Sep 7, 2025Updated 6 months ago
- Node-RED Flow (and web page example) for the LLaMA AI model☆11Jul 27, 2023Updated 2 years ago
- Denoising GANs -- TensorFlow2 training code for Gaussian denoiser using the GAN framework.☆10Jan 6, 2022Updated 4 years ago
- Angular Frontend for the Spring Boot Microservices series☆13Jun 9, 2024Updated last year
- A partially implemented ODBC driver for the Trino distributed SQL engine☆18Feb 2, 2026Updated last month
- ☆12Jul 27, 2021Updated 4 years ago
- Project - Data Processing and Analysis in Python Course☆39Oct 10, 2018Updated 7 years ago
- 青岛船舶检测☆13Apr 16, 2025Updated 10 months ago
- ☆13Feb 27, 2026Updated last week
- ☆16Apr 1, 2025Updated 11 months ago
- Big Data Inventory Management on AWS (Demand Forecasting, Machine Learning, Dashboarding) : Presented at Carlson School of Management dur…☆11Apr 15, 2020Updated 5 years ago
- 用来测试客户端连接和消息收发是否正常的Tcp服务器☆11Nov 6, 2018Updated 7 years ago
- This GitHub repository contains a project that automates the provisioning of a Kubernetes (K8s) cluster using Infrastructure as Code (IaC…☆15Oct 19, 2025Updated 4 months ago
- Depenency free (so far) Vanilla JS Dashboard UI for the mediamtx streaming server. Dockerized.☆32Feb 2, 2026Updated last month
- A shell script to automate the operations of sqoop☆11Mar 29, 2021Updated 4 years ago
- SoftUni course CSharp OOP Advanced: All tasks with their solutions.☆10Aug 14, 2020Updated 5 years ago
- Is using KoP (Kafka-On-Pulsar) a good idea? Use the scenarios implemented in this repository to check whether Pulsar with KoP enabled is …☆12Nov 3, 2022Updated 3 years ago
- MOM (My Own Messages) Client - A voice for you and your smart contracts☆10Jan 7, 2023Updated 3 years ago
- Repo for transient training paper at ICAC 2019.☆11Oct 5, 2022Updated 3 years ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆45Dec 11, 2023Updated 2 years ago
- HttpAgent 是一个高性能、灵活且易用的开源库,提供了全面的 HTTP 功能支持,包括文件传输、轮询、测试工具、实时通信、请求管理、Media 类型处理、MessagePack 支持、声明式请求等,并具有低资源消耗和高测试覆盖率的特点。☆10Feb 11, 2026Updated 3 weeks ago
- A Reservation Management App using Firebase Back-end with Flutter for Cross Platform Development.☆13Sep 5, 2022Updated 3 years ago
- TTS utility☆12Aug 2, 2020Updated 5 years ago
- ☆11Jun 4, 2023Updated 2 years ago
- This repository contains auxiliary installation code for self-hosting Studio☆14Oct 29, 2024Updated last year
- Template engine for generating pdf documents☆10Dec 8, 2022Updated 3 years ago
- Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.☆11Nov 7, 2019Updated 6 years ago
- End to End Sales Streaming Pipeline (FastAPI, Kafka, Spark, Cassandra, MySQL, Superset)☆10May 26, 2023Updated 2 years ago
- Network- and GPU-aware management of serverless functions at the edge☆15Mar 3, 2023Updated 3 years ago
- Marshmallow serializer integration with pyspark☆12Dec 29, 2023Updated 2 years ago
- ☆10Nov 12, 2021Updated 4 years ago