Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing, measuring CPUs' performance, and I/O latency heat maps. Jupyter notebooks examples for using various DB systems.
☆462May 19, 2026Updated 3 weeks ago
Alternatives and similar repositories for Miscellaneous
Users that are interested in Miscellaneous are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…☆825May 19, 2026Updated 3 weeks ago
- Spark-Dashboard is an open-source monitoring solution for Apache Spark that provides real-time performance dashboards using containers an…☆135May 6, 2026Updated last month
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆96May 11, 2026Updated last month
- Qubole Sparklens tool for performance tuning Apache Spark☆591Jun 26, 2024Updated last year
- Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.☆1,571Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The Internals of Delta Lake☆186May 10, 2026Updated last month
- Spark metrics related custom classes and sinks (e.g. Prometheus)☆187Aug 2, 2022Updated 3 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆94Apr 8, 2024Updated 2 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆238Jun 5, 2026Updated last week
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆255Feb 21, 2023Updated 3 years ago
- Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.☆24Jul 7, 2016Updated 9 years ago
- The Internals of Spark SQL☆487Jan 25, 2026Updated 4 months ago
- Intel® Performance Counter Monitor (Intel® PCM)☆3,285Apr 30, 2026Updated last month
- Enabling Spark Optimization through Cross-stack Monitoring and Visualization☆47Aug 23, 2017Updated 8 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 5 years ago
- Helpers & syntactic sugar for PySpark.☆62Dec 4, 2025Updated 6 months ago
- The Internals of Apache Spark☆1,546Apr 12, 2026Updated 2 months ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆53Jun 17, 2025Updated last year
- JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter☆1,801May 21, 2026Updated 3 weeks ago
- Monitor Apache Spark from Jupyter Notebook☆172May 16, 2022Updated 4 years ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆16May 21, 2026Updated 3 weeks ago
- Scripts for generating Grafana dashboards for monitoring Spark jobs☆240Mar 26, 2015Updated 11 years ago
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆335Sep 29, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Jupyter magics and kernels for working with remote Spark clusters☆1,360Sep 9, 2025Updated 9 months ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆430Jan 14, 2022Updated 4 years ago
- User space software for Intel(R) Resource Director Technology☆749Jun 10, 2026Updated last week
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆131Dec 19, 2024Updated last year
- A tool to get better debug info on spark's memory usage☆42Aug 21, 2019Updated 6 years ago
- Apache DataFusion Comet Spark Accelerator☆1,209Updated this week
- Custom Alerts for Ambari server☆12Jul 27, 2015Updated 10 years ago
- ☆14Aug 21, 2021Updated 4 years ago
- Spark RAPIDS plugin - accelerate Apache Spark with GPUs☆979Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Sample processing code using Spark 2.1+ and Scala☆51Jun 28, 2020Updated 5 years ago
- Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.☆1,053Updated this week
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30May 13, 2026Updated last month
- The Internals of Spark Structured Streaming☆420Mar 3, 2026Updated 3 months ago
- Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark☆1,369Aug 22, 2023Updated 2 years ago
- Queries, Dashboards, and Splunk Knowledge Objects to Monitor Oracle Database Metrics☆14Mar 11, 2021Updated 5 years ago
- spark structured streaming via HTTP communication☆18Jul 7, 2022Updated 3 years ago