LucaCanali / MiscellaneousLinks
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing and measuring CPUs's performance. Jupyter notebooks examples for using various DB systems.
☆453Updated 2 weeks ago
Alternatives and similar repositories for Miscellaneous
Users that are interested in Miscellaneous are comparing it to the libraries listed below
Sorting:
- Use the TPC-DS benchmark to test Spark SQL performance☆180Updated 5 years ago
- Benchmark Suite for Apache Spark☆241Updated 2 years ago
- This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…☆250Updated 6 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆127Updated 6 months ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆257Updated 2 years ago
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆764Updated 3 weeks ago
- The Internals of Spark SQL☆468Updated this week
- TPC-DS Kit for Impala☆171Updated last year
- TPC-H queries in Apache Spark SQL using native DataFrames API☆99Updated last year
- Spark Terasort☆121Updated 2 years ago
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆327Updated last year
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆89Updated last month
- Qubole Sparklens tool for performance tuning Apache Spark☆579Updated last year
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆125Updated last month
- All the things about TPC-DS in Apache Spark☆106Updated 2 years ago
- ☆386Updated last year
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆425Updated 3 years ago
- The Internals of Spark Structured Streaming☆419Updated 2 years ago
- Spark Shuffle Optimization with RDMA+AEP☆30Updated 2 years ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆284Updated this week
- The Internals of Delta Lake☆184Updated 5 months ago
- Gluten: Plugin to Boost Trino's Performance☆72Updated last year
- Performance Analysis Tool☆76Updated last month
- ☆310Updated 6 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆226Updated 3 months ago
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆299Updated last year
- RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.☆339Updated 2 months ago
- Benchmarks for queries over continuous data streams.☆349Updated 6 months ago
- Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.☆1,375Updated this week
- This repository contains the code base for the Open Stream Processing Benchmark.☆51Updated 3 years ago