The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
☆80Feb 27, 2023Updated 3 years ago
Alternatives and similar repositories for Big-Data-Cluster
Users that are interested in Big-Data-Cluster are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Hadoop-Hive-Spark cluster + Jupyter on Docker☆84Jan 2, 2025Updated last year
- Repository for building docker image, with open-source applications☆26Apr 23, 2024Updated 2 years ago
- Demonstrating practical SQL skills through a curated portfolio of solved problems from top coding platforms.☆51Mar 18, 2026Updated 3 months ago
- Run Hadoop Cluster within Docker Containers.☆16Mar 6, 2025Updated last year
- This project demonstrates real-time data streaming and processing architecture using Kafka, Spark Streaming, and Debezium for capturing C…☆14Oct 24, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perfo…☆10Oct 20, 2017Updated 8 years ago
- ☆159Dec 27, 2024Updated last year
- Delta-Lake, ETL, Spark, Airflow☆49Oct 9, 2022Updated 3 years ago
- Here I will be exploring various tools and methods that are used in data engineering process with Python.☆21Jan 4, 2021Updated 5 years ago
- Big Data Ecosystem Docker☆429Apr 29, 2023Updated 3 years ago
- ETHome is an open-source blockchain based energy community controller☆11Feb 16, 2022Updated 4 years ago
- Public Docker Images for popular services☆53Sep 7, 2025Updated 9 months ago
- Bu repo 3-5 gün süreyle düzenlenen Python ile Makine Öğrenmesi Eğitimleri için oluşturulmuştur.☆20Oct 9, 2020Updated 5 years ago
- Blog API with Django Rest Framework.☆13Jan 4, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This Repo contains Jupyter Notebooks to recap on RDD, DataFrame, Spark Streaming and ML operations using Pyspark☆11Nov 3, 2024Updated last year
- Marshmallow serializer integration with pyspark☆12Dec 29, 2023Updated 2 years ago
- An end-to-end, containerized data pipeline for near-real-time user event analytics using Kafka, ClickHouse, Airflow, and PySpark. Made to…☆79Sep 12, 2025Updated 9 months ago
- Data engineering mentorship program☆206Feb 21, 2026Updated 3 months ago
- Extract, transform, and load data for analytic processing using AWS Glue☆17May 2, 2021Updated 5 years ago
- Binary Particle Swarm Optimization applied to the unit commitment problem in an electric microgrid.☆15Jun 22, 2019Updated 6 years ago
- TrafficAdvisor: a Real-Time Traffic Monitoring System☆14Sep 10, 2018Updated 7 years ago
- Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand☆55Sep 30, 2023Updated 2 years ago
- Dockerizing an Apache Spark Standalone Cluster☆42Jun 29, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Data pipeline for extracting, transforming, and visualising Covid-19 data☆14Apr 23, 2023Updated 3 years ago
- A Python PySpark Projet with Poetry☆31May 2, 2026Updated last month
- On-premises ELT Pipeline☆32Jul 10, 2025Updated 11 months ago
- Parameter Importance according to OpenML☆14Feb 23, 2022Updated 4 years ago
- This repo gives an introduction to setting up streaming analytics using open source technologies☆25Mar 2, 2023Updated 3 years ago
- ☆12Jul 27, 2021Updated 4 years ago
- ☆12Jul 22, 2025Updated 10 months ago
- Spark implementation of Slowly Changing Dimension type 2☆11Jan 8, 2019Updated 7 years ago
- ☆10Jul 24, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Small data engineering tutorial☆10Oct 24, 2018Updated 7 years ago
- Docker powered container for using Nginx as reverse-proxy in combination with an OpenVPN Client.☆11Jan 1, 2020Updated 6 years ago
- Now updated prior to the version on CRAN.☆15Jan 9, 2024Updated 2 years ago
- R package for Markov regime-switching models☆12Jan 23, 2018Updated 8 years ago
- ☆17Apr 1, 2025Updated last year
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆45Jan 4, 2024Updated 2 years ago
- A shell script to automate the operations of sqoop☆11Mar 29, 2021Updated 5 years ago