LucaCanali / Miscellaneous
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
☆424Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for Miscellaneous
- Use the TPC-DS benchmark to test Spark SQL performance☆175Updated 4 years ago
- Benchmark Suite for Apache Spark☆238Updated last year
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆705Updated 3 months ago
- The Internals of Spark SQL☆454Updated 2 months ago
- This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…☆241Updated 5 years ago
- Spark Terasort☆123Updated last year
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆256Updated last year
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆127Updated last month
- All the things about TPC-DS in Apache Spark☆104Updated last year
- ☆376Updated 9 months ago
- ☆305Updated 5 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated 3 months ago
- TPC-DS Kit for Impala☆170Updated 5 months ago
- TPC-H queries in Apache Spark SQL using native DataFrames API☆98Updated 9 months ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆84Updated 7 months ago
- Performance Analysis Tool☆76Updated last year
- Qubole Sparklens tool for performance tuning Apache Spark☆568Updated 4 months ago
- The Internals of Delta Lake☆182Updated last month
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆323Updated last year
- TPC-DS benchmark kit with some modifications/fixes☆321Updated 6 months ago
- A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…☆171Updated 2 years ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆424Updated 2 years ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆274Updated last month
- Cache File System optimized for columnar formats and object stores☆183Updated 2 years ago