LucaCanali / Miscellaneous
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
☆424Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for Miscellaneous
- Use the TPC-DS benchmark to test Spark SQL performance☆175Updated 4 years ago
- Benchmark Suite for Apache Spark☆238Updated last year
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆708Updated 3 months ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆256Updated last year
- The Internals of Spark SQL☆456Updated this week
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆127Updated last month
- This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…☆241Updated 5 years ago
- ☆376Updated 9 months ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated this week
- All the things about TPC-DS in Apache Spark☆104Updated last year
- Qubole Sparklens tool for performance tuning Apache Spark☆568Updated 4 months ago
- Spark Terasort☆123Updated last year
- The Internals of Delta Lake☆183Updated last month
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆298Updated 10 months ago
- Spark Shuffle Optimization with RDMA+AEP☆30Updated last year
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆323Updated last year
- TPC-DS benchmark kit with some modifications/fixes☆322Updated 7 months ago
- TPC-H queries in Apache Spark SQL using native DataFrames API☆98Updated 9 months ago
- Cache File System optimized for columnar formats and object stores☆183Updated 2 years ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆274Updated last month
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆85Updated 7 months ago
- Apache Spark TPC-DS benchmark setup with EMR launch setup☆15Updated 2 years ago
- Performance Analysis Tool☆76Updated last year
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆187Updated last year
- ☆305Updated 5 years ago