oap-project / raydp
RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
☆313Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for raydp
- A portable Pythonic Data Catalog API powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture t…☆159Updated this week
- Spark RAPIDS plugin - accelerate Apache Spark with GPUs☆806Updated this week
- Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.☆90Updated 4 months ago
- FeatHub - A stream-batch unified feature store for real-time machine learning☆315Updated 5 months ago
- Distributed SQL Query Engine in Python using Ray☆238Updated last month
- Tracking Ray Enhancement Proposals☆48Updated this week
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆256Updated last year
- Ray-based Apache Beam runner☆42Updated last year
- A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.☆126Updated this week
- Remote shuffle service for Apache Spark to store shuffle data on remote servers.☆323Updated last year
- Point-in-Time optimizations for Apache Spark☆29Updated 9 months ago
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆297Updated 9 months ago
- Machine learning library of Apache Flink☆307Updated this week
- A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.☆38Updated 6 months ago
- The Internals of Delta Lake☆182Updated last month
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated 2 months ago
- Uniffle is a high performance, general purpose Remote Shuffle Service.☆381Updated this week
- Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.☆1,204Updated this week
- Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shu…☆252Updated last year
- Use the TPC-DS benchmark to test Spark SQL performance☆175Updated 4 years ago
- Hopsworks - Data-Intensive AI platform with a Feature Store☆1,159Updated this week
- Distributed XGBoost on Ray☆143Updated 4 months ago
- User tools for Spark RAPIDS☆54Updated this week
- Performance optimization for Spark running on Kubernetes☆85Updated 4 years ago
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆704Updated 2 months ago