LucaCanali/Miscellaneous

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LucaCanali/Miscellaneous)

LucaCanali / Miscellaneous

Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing, measuring CPUs' performance, and I/O latency heat maps. Jupyter notebooks examples for using various DB systems.

☆464

Alternatives and similar repositories for Miscellaneous

Users that are interested in Miscellaneous are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LucaCanali / sparkMeasure
View on GitHub
This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…
☆827May 19, 2026Updated 2 months ago
cerndb / spark-dashboard
View on GitHub
Spark-Dashboard is an open-source monitoring solution for Apache Spark that provides real-time performance dashboards using containers an…
☆137May 6, 2026Updated 2 months ago
cerndb / SparkPlugins
View on GitHub
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆96May 11, 2026Updated 2 months ago
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,578Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
japila-books / delta-lake-internals
View on GitHub
The Internals of Delta Lake
☆186Jun 18, 2026Updated last month
banzaicloud / spark-metrics
View on GitHub
Spark metrics related custom classes and sinks (e.g. Prometheus)
☆186Aug 2, 2022Updated 3 years ago
G-Research / spark-extension
View on GitHub
A library that provides useful extensions to Apache Spark and PySpark.
☆239Jul 1, 2026Updated 3 weeks ago
databricks / spark-sql-perf
View on GitHub
☆623Feb 26, 2022Updated 4 years ago
oap-project / gazelle_plugin
View on GitHub
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
☆255Feb 21, 2023Updated 3 years ago
cerndb / Hadoop-Profiler
View on GitHub
Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.
☆24Jul 7, 2016Updated 10 years ago
japila-books / spark-sql-internals
View on GitHub
The Internals of Spark SQL
☆488Jan 25, 2026Updated 6 months ago
intel / pcm
View on GitHub
Intel® Performance Counter Monitor (Intel® PCM)
☆3,310Updated this week
ibm-research-ireland / sparkoscope
View on GitHub
Enabling Spark Optimization through Cross-stack Monitoring and Visualization
☆47Aug 23, 2017Updated 8 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
randolfgeist / oracle_scripts
View on GitHub
Collection of oracle scripts
☆25Mar 3, 2020Updated 6 years ago
swoop-inc / spark-records
View on GitHub
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆73Mar 14, 2021Updated 5 years ago
tubular / sparkly
View on GitHub
Helpers & syntactic sugar for PySpark.
☆62Dec 4, 2025Updated 7 months ago
japila-books / apache-spark-internals
View on GitHub
The Internals of Apache Spark
☆1,547Jul 18, 2026Updated last week
FINRAOS / MegaSparkDiff
View on GitHub
⚠️ Archived — This repository is no longer maintained and will not receive updates. A Spark-based data tool which facilitates comparison …
☆53Updated this week
krishnan-r / sparkmonitor
View on GitHub
Monitor Apache Spark from Jupyter Notebook
☆172May 16, 2022Updated 4 years ago
uber-common / jvm-profiler
View on GitHub
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
☆1,804May 21, 2026Updated 2 months ago
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
hammerlab / grafana-spark-dashboards
View on GitHub
Scripts for generating Grafana dashboards for monitoring Spark jobs
☆239Mar 26, 2015Updated 11 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
uber / RemoteShuffleService
View on GitHub
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
☆335Sep 29, 2023Updated 2 years ago
MemVerge / splash
View on GitHub
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
☆131Dec 19, 2024Updated last year
intel / intel-cmt-cat
View on GitHub
User space software for Intel(R) Resource Director Technology
☆751Updated this week
microsoft / hyperspace
View on GitHub
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
☆430Jan 14, 2022Updated 4 years ago
jupyter-incubator / sparkmagic
View on GitHub
Jupyter magics and kernels for working with remote Spark clusters
☆1,364Sep 9, 2025Updated 10 months ago
rcpsilva / PCC104_DesignAndAnalysisOfAlgorithms
View on GitHub
☆13Jul 20, 2026Updated last week
tmuth / splunking-oracle
View on GitHub
Queries, Dashboards, and Splunk Knowledge Objects to Monitor Oracle Database Metrics
☆14Mar 11, 2021Updated 5 years ago
pgsentinel / pg_ash_scripts
View on GitHub
☆14Aug 21, 2021Updated 4 years ago
squito / spark-memory
View on GitHub
A tool to get better debug info on spark's memory usage
☆42Aug 21, 2019Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
monolive / ambari-custom-alerts
View on GitHub
Custom Alerts for Ambari server
☆12Jul 27, 2015Updated 11 years ago
bartosz25 / spark-scala-playground
View on GitHub
Sample processing code using Spark 2.1+ and Scala
☆51Jun 28, 2020Updated 6 years ago
japila-books / spark-structured-streaming-internals
View on GitHub
The Internals of Spark Structured Streaming
☆420Mar 3, 2026Updated 4 months ago
linkedin / dr-elephant
View on GitHub
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
☆1,370Aug 22, 2023Updated 2 years ago
apache / celeborn
View on GitHub
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
☆1,059Updated this week
NVIDIA / cudf-spark
View on GitHub
NVIDIA cuDF for Apache Spark plugin - accelerate Apache Spark with GPUs
☆990Updated this week
spoddutur / spark-notes
View on GitHub
☆313Nov 26, 2018Updated 7 years ago