bartosz25/spark-scala-playground

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bartosz25/spark-scala-playground)

bartosz25 / spark-scala-playground

Sample processing code using Spark 2.1+ and Scala

☆51

Alternatives and similar repositories for spark-scala-playground

Users that are interested in spark-scala-playground are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

phatak-dev / spark-3.0-examples
View on GitHub
Examples of Spark 3.0
☆44Nov 11, 2020Updated 5 years ago
phatak-dev / Statistical-Data-Exploration-Using-Spark-2.0
View on GitHub
Data Exploration Using Spark 2.0
☆14Apr 17, 2018Updated 8 years ago
bartosz25 / spark-playground
View on GitHub
Code snippets used in demos recorded for the blog.
☆42Apr 30, 2026Updated 2 months ago
arrikto / learn-kubeflow
View on GitHub
Learn Kubeflow with Arrikto
☆15Jan 4, 2022Updated 4 years ago
masayuki038 / calcite-arrow-sample
View on GitHub
calcite-arrow-sample(WIP)
☆13Dec 17, 2017Updated 8 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
adamw / fp-stack-2020-pres
View on GitHub
☆13Dec 12, 2020Updated 5 years ago
sylvesterdj / LearningScala
View on GitHub
☆11Apr 15, 2019Updated 7 years ago
saikrishnapujari / Spark-Nested-Data-Parser
View on GitHub
Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark
☆16Jan 22, 2024Updated 2 years ago
PacktPublishing / Mastering-Apache-Spark-2x
View on GitHub
Mastering Apache Spark 2x, published by Packt
☆17Jan 30, 2023Updated 3 years ago
AbsaOSS / spark-hofs
View on GitHub
Scala API for Apache Spark SQL high-order functions
☆15Aug 4, 2023Updated 2 years ago
gdgt / cmapi
View on GitHub
Cloudera Manager CM API Python end-to-end example
☆15Aug 29, 2019Updated 6 years ago
hortonworks-spark / spark-hive-streaming-sink
View on GitHub
A sink to save Spark Structured Streaming DataFrame into Hive table
☆23May 7, 2018Updated 8 years ago
godatadriven / dbt-data-ai-summit
View on GitHub
Code that was used as an example during the Data+AI Summit 2020
☆15Mar 8, 2021Updated 5 years ago
phatak-dev / java-sizeof
View on GitHub
Memory consumption estimator for Scala/Java
☆27Nov 24, 2014Updated 11 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
histogrammar / histogrammar-scala
View on GitHub
Scala implementation of Histogrammar, with optional front-ends and back-ends as separate Maven projects.
☆15Dec 29, 2023Updated 2 years ago
vivek-bombatkar / Databricks-Apache-Spark-2X-Certified-Developer
View on GitHub
Databricks - Apache Spark™ - 2X Certified Developer
☆265Jul 24, 2020Updated 6 years ago
phatak-dev / spark2.0-examples
View on GitHub
Examples of Spark 2.0
☆213Aug 11, 2021Updated 4 years ago
rockthejvm / spark-optimization
View on GitHub
The official repository for the Rock the JVM Spark Optimization with Scala course
☆58Jun 20, 2026Updated last month
justinrmiller / spark-kafka-parquet-example
View on GitHub
An example project that combines Spark Streaming, Kafka, and Parquet to transform JSON objects streamed over Kafka into Parquet files in …
☆19Jun 22, 2021Updated 5 years ago
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated last month
LucaCanali / sparkMeasure
View on GitHub
This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…
☆827May 19, 2026Updated 2 months ago
hammerlab / spark-util
View on GitHub
low-level helpers for Apache Spark libraries and tests
☆16Dec 29, 2018Updated 7 years ago
rohgar / scala-parallel-programming-3
View on GitHub
☆21Feb 9, 2017Updated 9 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
saikrishnapujari / Spark-Drools-Integration
View on GitHub
☆23Apr 22, 2019Updated 7 years ago
oranda / treelog-scalajs
View on GitHub
Gives TreeLog a GUI, the ScalaJS ReactTreeView
☆10Jun 23, 2016Updated 10 years ago
assafmendelson / DataSourceV2
View on GitHub
☆23Oct 8, 2018Updated 7 years ago
univalence / spark-tools
View on GitHub
☆46Apr 27, 2020Updated 6 years ago
kirkhas / zeppelin-notebooks
View on GitHub
Kirk's Zeppelin Notebooks
☆11May 22, 2018Updated 8 years ago
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
squito / spark-memory
View on GitHub
A tool to get better debug info on spark's memory usage
☆42Aug 21, 2019Updated 6 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago
rudolfolah / mermaid-diagram-examples
View on GitHub
Examples of diagrams using Mermaid: https://mermaid.js.org/intro/
☆12Mar 25, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
rxin / distributed-aligner
View on GitHub
A distributed machine translation aligner implemented in Spark.
☆17May 5, 2011Updated 15 years ago
ansrivas / spark-structured-streaming
View on GitHub
Spark structured streaming with Kafka data source and writing to Cassandra
☆62Dec 5, 2019Updated 6 years ago
sami-badawi / spark-setup.g8
View on GitHub
Template for Scala Spark with Unit Test
☆13Jul 24, 2023Updated 3 years ago
cerndb / SparkPlugins
View on GitHub
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆96May 11, 2026Updated 2 months ago
databrickslabs / delta-oms
View on GitHub
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics f…
☆42Nov 27, 2023Updated 2 years ago
GatsbyNewton / hive-udf
View on GitHub
UDF, GenericUDF, UDTF, UDAF
☆11Jul 1, 2022Updated 4 years ago
redpanda-data / flink-kafka-examples
View on GitHub
A repo of Java examples using Apache Flink with flink-connector-kafka
☆10Mar 10, 2026Updated 4 months ago