Dockerizing an Apache Spark Standalone Cluster
☆42Jun 29, 2022Updated 3 years ago
Alternatives and similar repositories for apache-spark-docker
Users that are interested in apache-spark-docker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The goal of this project is to identify students at risk of dropping out the school☆22May 7, 2021Updated 4 years ago
- A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler…☆13Jun 29, 2022Updated 3 years ago
- Term Frequency-Inverse Document Frequency from Scratch☆14Sep 19, 2021Updated 4 years ago
- Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag☆23Sep 19, 2022Updated 3 years ago
- Challenge Data Engineer☆25Jun 13, 2022Updated 3 years ago
- Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)☆13Jun 13, 2022Updated 3 years ago
- Sample Project to Learn Data Engineering☆10Aug 1, 2021Updated 4 years ago
- The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such …☆123Jun 29, 2022Updated 3 years ago
- Big Data infrastructure with Hadoop, Spark, Hive and NiFi deployed using Docker Compose. https://doi.org/10.5281/zenodo.18968438☆20Mar 11, 2026Updated last week
- CI/CD platform using Jenkins, docker, Sonar, Nexus, Jmeter, Selenium, Ansible, AWX, Grafana, Prometheus, Zabbix, Stress-ng☆21Feb 5, 2026Updated last month
- An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit☆20Aug 5, 2022Updated 3 years ago
- zdh系列-基于java的经营风控引擎☆13Mar 7, 2026Updated 2 weeks ago
- Zeppelin docker☆16Nov 16, 2020Updated 5 years ago
- Repository for the Document streaming capstone projects☆12Nov 17, 2025Updated 4 months ago
- This repo is an approach to TDD in machine learning model operation. it covers project structure, testing essentials using pytest with Gi…☆15Dec 2, 2020Updated 5 years ago
- A Python library to simplify batch requests to AWS Services☆12Apr 25, 2020Updated 5 years ago
- Repository for Apache Spark course at Team Data Science☆17Oct 23, 2020Updated 5 years ago
- Local Development of AWS Glue with Docker and Visual Studio Code☆14Nov 29, 2021Updated 4 years ago
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- Rasa Chatbot using Django backend and Sockets for communication☆12Dec 8, 2022Updated 3 years ago
- ☆14Sep 14, 2021Updated 4 years ago
- ☆11May 28, 2025Updated 9 months ago
- Distributed stock price forecasting system to predict S&P 500 stock prices.☆11Nov 12, 2021Updated 4 years ago
- Power Plant ML Pipeline Application - Apache Spark☆12Dec 12, 2016Updated 9 years ago
- PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …☆12Sep 5, 2023Updated 2 years ago
- Multi-container environment with Hadoop, Spark and Hive☆232May 5, 2025Updated 10 months ago
- NLP Model for predicting 17 different languages☆16Oct 19, 2023Updated 2 years ago
- Starting up a Kubernetes cluster with Vagrant, with Gluster, Portworx, Linstor, or StorageOS as storage provider and Traefik as ingress c…☆11May 25, 2022Updated 3 years ago
- Hadoop, Hive and PrestoDB for deployment using Docker☆27Oct 21, 2025Updated 5 months ago
- Spark Projects for the Berkeley Data Science Course☆13Aug 12, 2015Updated 10 years ago
- This sample demonstrates how to use the Microsoft Graph JavaScript SDK to access data in Office 365 from Office Add-ins.☆15May 26, 2025Updated 9 months ago
- A Hadoop cluster based on Docker, including Hive and Spark.☆83Nov 13, 2022Updated 3 years ago
- MultiPaxos and Disk Paxos in TLA+ and PlusCal☆13Jan 23, 2023Updated 3 years ago
- Combination of Dockerized Hortonworks projects and other Hadoop ecosystem components☆10Oct 11, 2019Updated 6 years ago
- This repo is to create a full kubernetes cluster using k3d with MetalLb, prometheus, cert-manager, and traefik.☆12Mar 5, 2026Updated 2 weeks ago
- Benchmark dataset for the evaluation of scientific article representations on the task of citation recommendation across various scientif…☆12Oct 21, 2022Updated 3 years ago
- GPT3 Chrome Extension Starter Kit☆16Jan 16, 2023Updated 3 years ago
- noiseprint2 is a porting of noiseprint to tensorflow 2 and keras☆12Feb 20, 2021Updated 5 years ago
- Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset☆15Jul 16, 2017Updated 8 years ago