Dockerizing an Apache Spark Standalone Cluster
☆42Jun 29, 2022Updated 3 years ago
Alternatives and similar repositories for apache-spark-docker
Users that are interested in apache-spark-docker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Dockerizing and Consuming an Apache Livy environment☆13Jun 29, 2022Updated 3 years ago
- Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag☆23Sep 19, 2022Updated 3 years ago
- Challenge Data Engineer☆25Jun 13, 2022Updated 3 years ago
- Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)☆14Jun 13, 2022Updated 3 years ago
- Sample Project to Learn Data Engineering☆10Aug 1, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such …☆123Jun 29, 2022Updated 3 years ago
- CI/CD platform using Jenkins, docker, Sonar, Nexus, Jmeter, Selenium, Ansible, AWX, Grafana, Prometheus, Zabbix, Stress-ng☆20Apr 26, 2026Updated 3 weeks ago
- ☆11Jul 13, 2020Updated 5 years ago
- ☆10Jun 3, 2023Updated 2 years ago
- zdh系列-基于java的经营风控引擎☆13Apr 27, 2026Updated 3 weeks ago
- This repo is an approach to TDD in machine learning model operation. it covers project structure, testing essentials using pytest with Gi…☆15Dec 2, 2020Updated 5 years ago
- Just a boilerplate for PySpark and Flask☆36Aug 2, 2018Updated 7 years ago
- A Python library to simplify batch requests to AWS Services☆12Apr 25, 2020Updated 6 years ago
- Rainfall is an extensible java framework to implement custom DSL based stress and performance tests☆12Mar 31, 2026Updated last month
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Dockerizing a Python Script for Web Scraping and consume the scraped data using FastApi (www.metroscubicos.com)☆15Dec 16, 2021Updated 4 years ago
- Local Development of AWS Glue with Docker and Visual Studio Code☆14Nov 29, 2021Updated 4 years ago
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- Group project for the WorldQuant University module, risk management.☆13Feb 3, 2019Updated 7 years ago
- ☆14Sep 14, 2021Updated 4 years ago
- ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…☆11Mar 9, 2022Updated 4 years ago
- A docker image with a pre-configured Hive Metastore and a Spark ThriftServer☆19Jan 20, 2020Updated 6 years ago
- A simple CDK app written in Kotlin using Gradle DSL☆12Dec 28, 2018Updated 7 years ago
- A data engineering pipeline for digital marketers.☆11Dec 21, 2018Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …☆12Sep 5, 2023Updated 2 years ago
- Multi-container environment with Hadoop, Spark and Hive☆235May 5, 2025Updated last year
- A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.☆11Jul 4, 2021Updated 4 years ago
- Spark Projects for the Berkeley Data Science Course☆13Aug 12, 2015Updated 10 years ago
- While doing this course i worked on multiple technologies and python as base language for programming. These are all of the projects i di…☆20Sep 22, 2021Updated 4 years ago
- This sample demonstrates how to use the Microsoft Graph JavaScript SDK to access data in Office 365 from Office Add-ins.☆15May 26, 2025Updated 11 months ago
- ☆12May 24, 2022Updated 4 years ago
- ☆14Jul 12, 2020Updated 5 years ago
- Node-RED Flow (and web page example) for the LLaMA AI model☆11Jul 27, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11May 22, 2018Updated 8 years ago
- Run a Spark job within Amazon EMR☆12Sep 12, 2020Updated 5 years ago
- Analyzing Big Data with Amazon EMR☆12Sep 14, 2020Updated 5 years ago
- Ask question to your PDF☆10Jun 11, 2023Updated 2 years ago
- A template to create CVs/Resumes with Quarto☆11Jul 17, 2023Updated 2 years ago
- noiseprint2 is a porting of noiseprint to tensorflow 2 and keras☆12Feb 20, 2021Updated 5 years ago
- Methods for mapping proteomics data on 3D protein structure.☆15Jan 18, 2020Updated 6 years ago