ray-project/raydp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ray-project/raydp)

ray-project / raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.

☆374

Alternatives and similar repositories for raydp

Users that are interested in raydp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ray-project / deltacat
View on GitHub
A portable Multimodal Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to you…
☆282Apr 17, 2026Updated 3 months ago
ray-project / mobius
View on GitHub
Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.
☆105Jun 21, 2024Updated 2 years ago
ray-project / kuberay
View on GitHub
A toolkit to run Ray applications on Kubernetes
☆2,596Updated this week
datafusion-contrib / ray-sql
View on GitHub
Distributed SQL Query Engine in Python using Ray
☆245Oct 2, 2024Updated last year
NVIDIA / cudf-spark
View on GitHub
NVIDIA cuDF for Apache Spark plugin - accelerate Apache Spark with GPUs
☆990Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ray-project / ray_shuffling_data_loader
View on GitHub
A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…
☆18Jan 5, 2023Updated 3 years ago
ryantd / veloce
View on GitHub
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
☆17Aug 4, 2022Updated 3 years ago
intel / llm-on-ray
View on GitHub
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Sep 23, 2025Updated 9 months ago
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,576Updated this week
oap-project / gazelle_plugin
View on GitHub
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
☆255Feb 21, 2023Updated 3 years ago
ray-project / ray
View on GitHub
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
☆43,297Updated this week
intel / BigDL
View on GitHub
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
☆2,700Jun 12, 2026Updated last month
antgroup / ant-ray
View on GitHub
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. AntRay i…
☆170Jul 4, 2026Updated 2 weeks ago
facebookincubator / velox
View on GitHub
A composable and fully extensible C++ execution engine library for data management systems.
☆4,173Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ray-project / ray-llm
View on GitHub
RayLLM - LLMs on Ray (Archived). Read README for more info.
☆1,262Mar 13, 2025Updated last year
zhisbug / ray-scalable-ml-design
View on GitHub
Some microbenchmarks and design docs before commencement
☆11Feb 1, 2021Updated 5 years ago
Eventual-Inc / Daft
View on GitHub
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
☆5,644Updated this week
ray-project / mlflow-ray-serve
View on GitHub
MLFlow Deployment Plugin for Ray Serve
☆47Apr 12, 2022Updated 4 years ago
ray-project / ray_beam_runner
View on GitHub
Ray-based Apache Beam runner
☆42Aug 30, 2023Updated 2 years ago
apache / celeborn
View on GitHub
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
☆1,056Updated this week
ray-project / enhancements
View on GitHub
Tracking Ray Enhancement Proposals
☆68Jun 1, 2026Updated last month
mars-project / mars
View on GitHub
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
☆2,741Jan 2, 2024Updated 2 years ago
exoshuffle / cloudsort
View on GitHub
Exoshuffle-CloudSort
☆30Mar 2, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
project-codeflare / rayvens
View on GitHub
Rayvens makes it possible for data scientists to access hundreds of data services within Ray with little effort.
☆50Nov 29, 2022Updated 3 years ago
ray-project / xgboost_ray
View on GitHub
Distributed XGBoost on Ray
☆153Jun 25, 2024Updated 2 years ago
oap-project / remote-shuffle
View on GitHub
Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-dis…
☆21Mar 15, 2024Updated 2 years ago
lance-format / lance-ray
View on GitHub
Integration between Lance and Ray for distributed data processing
☆35Updated this week
apache / auron
View on GitHub
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query process…
☆1,778Updated this week
pytorch / torcharrow
View on GitHub
High performance model preprocessing library on PyTorch
☆641Mar 29, 2024Updated 2 years ago
ray-project / ray_lightning
View on GitHub
Pytorch Lightning Distributed Accelerators using Ray
☆215Nov 3, 2023Updated 2 years ago
ray-project / langchain-ray
View on GitHub
Examples on how to use LangChain and Ray
☆231Jun 14, 2023Updated 3 years ago
uber / petastorm
View on GitHub
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet f…
☆1,888Jan 2, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
byzer-org / byzer-lang
View on GitHub
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
☆1,835May 29, 2024Updated 2 years ago
lance-format / lance
View on GitHub
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data ve…
☆6,825Updated this week
oap-project / cloudtik
View on GitHub
Cloud Scale Platform for Distributed Analytics and AI
☆24Oct 12, 2023Updated 2 years ago
ray-project / rayfed
View on GitHub
A multiple parties joint, distributed execution engine based on Ray, to help build your own federated learning frameworks in minutes.
☆96Aug 28, 2024Updated last year
kubeflow / spark-operator
View on GitHub
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
☆3,140Updated this week
oap-project / oap-mllib
View on GitHub
Optimized Spark package to accelerate machine learning algorithms in Apache Spark MLlib.
☆22Jul 6, 2026Updated 2 weeks ago
feature-store / ralf
View on GitHub
☆30Aug 31, 2022Updated 3 years ago