holdenk/spark-upgrade

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/holdenk/spark-upgrade)

holdenk / spark-upgrade

Magic to help Spark pipelines upgrade

☆34

Alternatives and similar repositories for spark-upgrade

Users that are interested in spark-upgrade are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GoogleCloudPlatform / dataproc-pubsub-spark-streaming
View on GitHub
☆31Oct 17, 2018Updated 7 years ago
zheyuan28 / SparkTaskMetrics
View on GitHub
Task Metrics Explorer
☆14Apr 2, 2019Updated 7 years ago
sbakiu / kubeflow-spark
View on GitHub
Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.
☆53May 26, 2022Updated 4 years ago
dttung2905 / flink-at-scale
View on GitHub
📚 Tech blogs & talks by companies that run Apache Flink in production
☆196Jul 5, 2026Updated 3 weeks ago
S-C-O-U-T / Pyadomd
View on GitHub
A pythonic approach to query SSAS data models.
☆35Jun 5, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
SANSA-Stack / SANSA-DataLake
View on GitHub
A library to query heterogeneous data sources uniformly using SPARQL
☆12Dec 5, 2023Updated 2 years ago
zetaris / lightning-catalog
View on GitHub
The Lightning Catalog is an open-source data catalog designed for preparing data at any scale in ad-hoc analytics, data virtualization, …
☆38Feb 5, 2026Updated 5 months ago
sarthak-sarbahi / data-analytics-minio-spark
View on GitHub
☆23Dec 19, 2023Updated 2 years ago
whosonfirst / go-pubssed
View on GitHub
Listen to a Redis PubSub channel and then rebroadcast it over Server-Sent Events (SSE).
☆12Updated this week
Nike-Inc / spark-expectations
View on GitHub
A Python Library to support running data quality rules while the spark job is running⚡
☆201Jul 14, 2026Updated 2 weeks ago
big-data-europe / docker-hdfs-filebrowser
View on GitHub
A docker image for HDFS FileBrowser. Cloudera Hue with FileBrowser only.
☆11Sep 20, 2018Updated 7 years ago
apache-spark-on-k8s / ansible
View on GitHub
Ansible playbooks for Apache Spark on kube
☆27Jul 20, 2017Updated 9 years ago
feliperazeek / spark-algebird-amazon-wordcloud
View on GitHub
Sample App. Amazon Product Descriptions Wordcloud. Spark Streaming, Algebird, Storehaus, Redis, Scala Scraper, OpenNLP, Play Framework, D…
☆12Nov 9, 2015Updated 10 years ago
fenetikm / dotfiles
View on GitHub
☆13Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
brunocfnba / Kubernetes-Airflow
View on GitHub
Setup Apache Airflow on Kubernetes
☆10Jul 20, 2018Updated 8 years ago
hbutani / icebergSQL
View on GitHub
Integration of Iceberg table management into Spark SQL
☆11Jan 21, 2020Updated 6 years ago
bigdatagenomics / bdg-formats
View on GitHub
Open source formats for scalable genomic processing systems using Avro. Apache 2 licensed.
☆42Feb 13, 2026Updated 5 months ago
databricks / security-bucket-brigade
View on GitHub
☆31Mar 30, 2023Updated 3 years ago
mcvoid / cmb
View on GitHub
A parser combinator library in Go
☆13Feb 17, 2020Updated 6 years ago
SmartDataAnalytics / MA-INF-4223-DBDA-Lab
View on GitHub
Repository for Lab “Distributed Big Data Analytics” (MA-INF 4223), University of Bonn
☆10Aug 11, 2022Updated 3 years ago
jasonbaldridge / twitter4j-tutorial
View on GitHub
A simple tutorial application for working with Twitter4j using Scala.
☆14Feb 26, 2013Updated 13 years ago
databricks / diviner
View on GitHub
Grouped time series forecasting engine
☆39Jun 23, 2023Updated 3 years ago
python / editorial-board
View on GitHub
Communications of the Python Documentation Editorial Board
☆12Jun 15, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MrPowers / bebe
View on GitHub
Filling in the Spark function gaps across APIs
☆50Apr 14, 2021Updated 5 years ago
oap-project / sql-ds-cache
View on GitHub
Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
☆37Jan 3, 2023Updated 3 years ago
yaooqinn / spark-history-cli
View on GitHub
CLI tool for querying Apache Spark History Server REST API
☆28Mar 22, 2026Updated 4 months ago
jess197 / football_statistics_etl_project
View on GitHub
☆13Dec 28, 2023Updated 2 years ago
rockthejvm / spark-performance-tuning
View on GitHub
The official repository for the Rock the JVM Spark Optimization 2 course
☆45Jun 20, 2026Updated last month
jasonk000 / examples
View on GitHub
☆15Oct 23, 2014Updated 11 years ago
brooklyn-data / delta
View on GitHub
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…
☆10Feb 10, 2023Updated 3 years ago
kyleu / scala-js-typescript
View on GitHub
A TypeScript-to-Scala.js converter. Designed for parsing definitelytyped.com, powers definitelyscala.com.
☆21Jan 9, 2021Updated 5 years ago
potix2 / spark-google-spreadsheets
View on GitHub
Google Spreadsheets datasource for SparkSQL and DataFrames
☆58Jul 24, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
godatadriven-dockerhub / hive-metastore
View on GitHub
Hadoop/Hive/Spark container to perform CI tests
☆10Dec 26, 2020Updated 5 years ago
joandre / MCL_spark
View on GitHub
An implementation of Markov Clustering algorithm for Spark in Scala
☆34Sep 10, 2017Updated 8 years ago
anbento0490 / tutorials
View on GitHub
☆21Jan 21, 2023Updated 3 years ago
rsanjabi / short-term-rentals-warehouse
View on GitHub
Pipeline, warehouse, and visualization tools for investigating the impact of Airbnb short-term rentals on world cities.
☆15Jun 9, 2023Updated 3 years ago
robcd / scala-either-extras
View on GitHub
[now somewhat enhanced] 20-odd line Scala type-class substitute for Scalaz for validation
☆15Jun 7, 2012Updated 14 years ago
cdfoundation / faq
View on GitHub
CDF FAQ
☆11Aug 16, 2022Updated 3 years ago
Joyan9 / pyspark-learning-journey
View on GitHub
☆10May 3, 2025Updated last year