LucaCanali / Miscellaneous
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter notebooks examples for Spark, examples for Oracle and other DB systems.
☆439Updated 2 weeks ago
Alternatives and similar repositories for Miscellaneous:
Users that are interested in Miscellaneous are comparing it to the libraries listed below
- Benchmark Suite for Apache Spark☆239Updated last year
- Use the TPC-DS benchmark to test Spark SQL performance☆177Updated 4 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆127Updated 3 months ago
- Spark Terasort☆122Updated last year
- The Internals of Spark SQL☆463Updated 2 months ago
- All the things about TPC-DS in Apache Spark☆104Updated last year
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆738Updated last week
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆120Updated this week
- This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…☆243Updated 5 years ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆257Updated 2 years ago
- The Internals of Delta Lake☆183Updated 2 months ago
- TPC-DS Kit for Impala☆171Updated 10 months ago
- A library that provides useful extensions to Apache Spark and PySpark.☆221Updated last week
- The Internals of Spark Structured Streaming☆417Updated 2 years ago
- ☆382Updated last year
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆88Updated last year
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆326Updated last year
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- Qubole Sparklens tool for performance tuning Apache Spark☆573Updated 9 months ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆425Updated 3 years ago
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆126Updated 6 years ago
- An extension of Yahoo's Benchmarks☆107Updated last year
- ☆308Updated 6 years ago
- Cache File System optimized for columnar formats and object stores☆183Updated 2 years ago
- Monitor Apache Spark from Jupyter Notebook☆172Updated 2 years ago
- TPC-H queries in Apache Spark SQL using native DataFrames API☆98Updated last year
- Apache Spark TPC-DS benchmark setup with EMR launch setup☆16Updated 2 years ago
- A tool to get better debug info on spark's memory usage☆42Updated 5 years ago
- Spark Shuffle Optimization with RDMA+AEP☆30Updated last year
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆185Updated 2 years ago