LucaCanali / Miscellaneous
Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark. Also tools for stress testing and measuring CPUs's performance. Jupyter notebooks examples for using various DB systems.
☆448Updated last week
Alternatives and similar repositories for Miscellaneous
Users that are interested in Miscellaneous are comparing it to the libraries listed below
Sorting:
- Use the TPC-DS benchmark to test Spark SQL performance☆179Updated 5 years ago
- The Internals of Spark SQL☆466Updated 4 months ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆127Updated 4 months ago
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆750Updated last week
- Benchmark Suite for Apache Spark☆241Updated 2 years ago
- All the things about TPC-DS in Apache Spark☆106Updated last year
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆257Updated 2 years ago
- TPC-DS Kit for Impala☆170Updated 11 months ago
- Spark Terasort☆122Updated 2 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆122Updated last week
- ☆383Updated last year
- The Internals of Delta Lake☆184Updated 4 months ago
- TPC-DS benchmark kit with some modifications/fixes☆334Updated last year
- TPC-H queries in Apache Spark SQL using native DataFrames API☆99Updated last year
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆89Updated last week
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…☆246Updated 6 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆223Updated last month
- ☆309Updated 6 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 4 months ago
- A re-implementation of Hadoop DistCP in Apache Spark☆47Updated last year
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆425Updated 3 years ago
- A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer☆50Updated last year
- Qubole Sparklens tool for performance tuning Apache Spark☆575Updated 10 months ago
- Performance Analysis Tool☆76Updated 2 years ago
- Java bindings for https://github.com/facebookincubator/velox☆25Updated this week
- Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object storage semantics.☆114Updated last year
- Star Schema Benchmark Tool for Apache Kylin☆97Updated 3 years ago
- Gluten: Plugin to Boost Trino's Performance☆71Updated last year
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆185Updated 2 years ago