NVIDIA / spark-rapids-examples
A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.
☆118Updated this week
Related projects: ⓘ
- Spark RAPIDS MLlib – accelerate Apache Spark MLlib with GPUs☆63Updated this week
- Spark RAPIDS Benchmarks – benchmark sets and utilities for the RAPIDS Accelerator for Apache Spark☆36Updated 2 weeks ago
- User tools for Spark RAPIDS☆49Updated this week
- RAPIDS Accelerator JNI For Apache Spark☆36Updated this week
- Spark RAPIDS plugin - accelerate Apache Spark with GPUs☆785Updated this week
- XGBoost GPU accelerated on Spark example applications☆51Updated 2 years ago
- LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as De…☆64Updated last week
- TPC-DS benchmark kit with some modifications/fixes☆85Updated last month
- RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.☆304Updated last month
- The Internals of Delta Lake☆180Updated last month
- Point-in-Time optimizations for Apache Spark☆29Updated 8 months ago
- ☆54Updated 8 months ago
- A library that provides useful extensions to Apache Spark and PySpark.☆193Updated this week
- Delta reader for the Ray open-source toolkit for building ML applications☆40Updated 7 months ago
- The Internals of Spark on Kubernetes☆71Updated 2 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated last month
- A Python Library to support running data quality rules while the spark job is running⚡☆161Updated last month
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆82Updated 5 months ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆256Updated last year
- Flowchart for debugging Spark applications☆100Updated last week
- ☆582Updated 2 years ago
- Delta Lake examples☆201Updated 3 months ago
- An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.☆423Updated 2 years ago
- Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on sing…☆23Updated 11 months ago
- Extensible Rules Engine for custom Dataframe / Dataset validation☆134Updated 4 months ago
- This repository contains the code base for the Open Stream Processing Benchmark.☆48Updated 2 years ago
- All the things about TPC-DS in Apache Spark☆104Updated last year
- Grouped time series forecasting engine☆36Updated last year
- ☆375Updated this week
- Spark RAPIDS Container – Docker containers for Spark RAPIDS☆19Updated 2 weeks ago