rodalbuyeh/pyspark-k8s-boilerplate

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/rodalbuyeh/pyspark-k8s-boilerplate)

rodalbuyeh / pyspark-k8s-boilerplate

Boilerplate for PySpark on Cloud Kubernetes

☆33

Alternatives and similar repositories for pyspark-k8s-boilerplate

Users that are interested in pyspark-k8s-boilerplate are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mehd-io / pyspark-boilerplate-mehdio
View on GitHub
Pyspark boilerplate for running prod ready data pipeline
☆29Mar 17, 2021Updated 5 years ago
codingvarun / streaming-elt-pipeline
View on GitHub
This is a real-life, high throughput streaming ELT data pipeline for ecommerce
☆15May 22, 2023Updated 3 years ago
ziritrion / mlopszoomcamp
View on GitHub
Homework and notes for the DataTalks.Club MLOps Zoomcamp
☆11Sep 10, 2022Updated 3 years ago
hrchlhck / k8s-bigdata
View on GitHub
Apache Spark with HDFS cluster within Kubernetes
☆12Jul 11, 2023Updated 3 years ago
vuthanhhai2302 / Applied-Pyspark
View on GitHub
My applied big data analytic project with pyspark.
☆10Sep 21, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cevoaustralia / glue-vscode
View on GitHub
Local Development of AWS Glue with Docker and Visual Studio Code
☆14Nov 29, 2021Updated 4 years ago
guidok91 / spark-movies-etl
View on GitHub
Spark data pipeline that processes movie ratings data.
☆31Jul 12, 2026Updated last week
antimoz-om / Antimoz
View on GitHub
A data engineering pipeline for digital marketers.
☆11Dec 21, 2018Updated 7 years ago
DIYBigData / spark-data-analysis-projects
View on GitHub
A collection of data analysis projects done using PySpark via Jupyter notebooks.
☆10Oct 8, 2022Updated 3 years ago
itversity / retail_db_json
View on GitHub
☆14Sep 14, 2021Updated 4 years ago
soyelherein / pyspark-cicd-template
View on GitHub
PySpark data-pipeline testing and CICD
☆28Oct 28, 2020Updated 5 years ago
ryanwithawhy / generate_temp_table_sql
View on GitHub
This package takes a CSV and generates SQL statements to create a temp table and insert the data within it into the temp table. Good whe…
☆16May 28, 2024Updated 2 years ago
LeonardoEmili / stock-price-forecasting
View on GitHub
Distributed stock price forecasting system to predict S&P 500 stock prices.
☆11Nov 12, 2021Updated 4 years ago
RishiSankineni / Machine-Learning-Pipeline-LR-Pyspark
View on GitHub
Power Plant ML Pipeline Application - Apache Spark
☆12Dec 12, 2016Updated 9 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
nancyyanyu / kafka_stock
View on GitHub
A financial data processing and visualization platform using Apache Kafka, Apache Cassandra, and Bokeh.
☆74Oct 5, 2021Updated 4 years ago
adityajain10 / pyspark-mlib-based-stock-predictor
View on GitHub
PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …
☆12Sep 5, 2023Updated 2 years ago
prakashdontaraju / google-cloud-ecommerce
View on GitHub
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…
☆11Mar 9, 2022Updated 4 years ago
mrodrig / firewalla-apcupsd
View on GitHub
Firewalla Scripts for APC UPS Daemon
☆13Dec 13, 2020Updated 5 years ago
mateuspicanco / project-atlas-sao-paulo
View on GitHub
A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.
☆12Jul 4, 2021Updated 5 years ago
clarknova99 / home-cluster
View on GitHub
My home Kubernetes cluster, managed by flux
☆19Updated this week
altendky / graham
View on GitHub
Graham, making s'mores with attrs and marshmallow.
☆12Sep 24, 2024Updated last year
mach-kernel / databricks-kube-operator
View on GitHub
A Kubernetes operator to enable GitOps style deploys for Databricks resources
☆16Jun 3, 2025Updated last year
vsouza / spark-kinesis-redshift
View on GitHub
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
☆11May 22, 2018Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
camposvinicius / gcp-etl
View on GitHub
This is a pipeline of an ETL application in GCP with open airport code data, which you can find here: https://datahub.io/core/airport-cod…
☆15Nov 15, 2021Updated 4 years ago
ketgo / marshmallow-pyspark
View on GitHub
Marshmallow serializer integration with pyspark
☆12Dec 29, 2023Updated 2 years ago
francescotescari / noiseprint2
View on GitHub
noiseprint2 is a porting of noiseprint to tensorflow 2 and keras
☆12Feb 20, 2021Updated 5 years ago
clips / yarn
View on GitHub
Disambiguating biomedical and clinical concepts with word embeddings
☆15Apr 17, 2018Updated 8 years ago
AWS-Big-Data-Projects / Analysing-Census-Data-using-aws
View on GitHub
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
☆13Sep 11, 2020Updated 5 years ago
sanic-org / tracerite
View on GitHub
Tracebacks for Humans (in Jupyter notebooks)
☆12Updated this week
mpfishe2 / az-databricks-realtime-alert-system
View on GitHub
Building a real-time alert monitoring pipeline that sends email notifications off of Azure Event Hubs, Azure Databricks, and a Azure Logi…
☆13Mar 8, 2020Updated 6 years ago
chaithanya21 / Sentiment-Analysis-using-Pyspark-on-Multi-Social-Media-Data
View on GitHub
In this mini-project i have chosen to do sentiment analysis of social media websites such as twitter and reddit to gain insights into the…
☆12Mar 5, 2020Updated 6 years ago
big-data-lab-team / accident-prediction-montreal
View on GitHub
☆12Dec 8, 2022Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
rvilla87 / ETL-PySpark
View on GitHub
ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)
☆17Dec 18, 2018Updated 7 years ago
Ceci-Aguilera / habaneras_de_lino_api
View on GitHub
Version 1 of Habaneras de Lino is an online ecommerce. This repo contains the backed api of the website using Django and Django Rest Fram…
☆13Dec 16, 2022Updated 3 years ago
yilong2001 / spark-sql-on-k8s
View on GitHub
最简单的 spark sql on kubernetes 生产环境部署方案
☆19Jun 12, 2023Updated 3 years ago
innat / Transfer-Learning-PySpark
View on GitHub
Multi-Class Classification | Transfer Learning With PySpark
☆13Nov 12, 2019Updated 6 years ago
Kuntal-G / BigData-Analytics
View on GitHub
Analytics projects using Big Data eco-systems (Hadoop, Spark, Storm)
☆17Dec 27, 2021Updated 4 years ago
aws-solutions-library-samples / guidance-for-sql-based-etl-with-apache-spark-on-amazon-eks
View on GitHub
A guidance that provides declarative data processing capability, and workflow orchestration automation to help your business users (such …
☆30Aug 29, 2025Updated 10 months ago
Martialhimanshu / GaanaSuno
View on GitHub
GaanaSuno is an application that lets users upload, store and play all of your music from the cloud. Additionally, a user can comment and…
☆12Aug 18, 2018Updated 7 years ago