amplab/keystone

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/amplab/keystone)

amplab / keystone

Simplifying robust end-to-end machine learning on Apache Spark.

☆473

Alternatives and similar repositories for keystone

Users that are interested in keystone are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

amplab / velox-modelserver
View on GitHub
☆110Apr 17, 2017Updated 9 years ago
collectivemedia / spark-ext
View on GitHub
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
☆145Jan 26, 2016Updated 10 years ago
amplab / SparkNet
View on GitHub
Distributed Neural Networks for Spark
☆609Jul 23, 2020Updated 5 years ago
zhangyuc / splash
View on GitHub
Splash Project for parallel stochastic learning
☆93Jun 16, 2017Updated 9 years ago
linkedin / photon-ml
View on GitHub
A scalable machine learning library on Apache Spark
☆797Aug 30, 2021Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
amplab / spark-indexedrdd
View on GitHub
An efficient updatable key-value store for Apache Spark
☆255Mar 11, 2017Updated 9 years ago
sryza / spark-timeseries
View on GitHub
A library for time series analysis on Apache Spark
☆1,197Oct 13, 2020Updated 5 years ago
amplab / MLI
View on GitHub
An API for Distributed Machine Learning
☆156Sep 22, 2016Updated 9 years ago
h2oai / sparkling-water
View on GitHub
Sparkling Water provides H2O functionality inside Spark cluster
☆979Nov 5, 2025Updated 8 months ago
databricks / tensorframes
View on GitHub
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
☆744Jul 30, 2024Updated last year
coral-streaming / coral
View on GitHub
Coral is a real-time analytics and data science platform. It transforms streaming events and extract patterns from data via RESTful APIs.…
☆148Sep 5, 2019Updated 6 years ago
spark-notebook / spark-notebook
View on GitHub
Interactive and Reactive Data Science using Scala and Spark.
☆3,142May 16, 2023Updated 3 years ago
TrueCar / mleap
View on GitHub
MLeap allows for easily putting Spark ML pipelines into production
☆78Oct 27, 2016Updated 9 years ago
cloudml / zen
View on GitHub
Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logi…
☆169Nov 17, 2018Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
tresata / spark-sorted
View on GitHub
Secondary sort and streaming reduce for Apache Spark
☆77Jul 3, 2023Updated 3 years ago
amplab / succinct
View on GitHub
Enabling queries on compressed data.
☆282Dec 16, 2023Updated 2 years ago
filodb / FiloDB
View on GitHub
Distributed Prometheus time series database
☆1,468Updated this week
scalanlp / breeze
View on GitHub
Breeze is/was a numerical processing library for Scala.
☆3,455Oct 4, 2025Updated 9 months ago
mesos / myriad
View on GitHub
https://github.com/apache/incubator-myriad is our new home. See
☆251Dec 2, 2015Updated 10 years ago
sjyk / sampleclean-async
View on GitHub
☆92Nov 15, 2015Updated 10 years ago
spark-jobserver / spark-jobserver
View on GitHub
REST job server for Apache Spark
☆2,837Mar 3, 2026Updated 4 months ago
OryxProject / oryx
View on GitHub
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
☆1,783Aug 16, 2021Updated 4 years ago
yahoo / CaffeOnSpark
View on GitHub
Distributed deep learning on Hadoop and Spark clusters.
☆1,261Nov 15, 2019Updated 6 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
factorie / factorie
View on GitHub
FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a suc…
☆552Dec 19, 2017Updated 8 years ago
apache / incubator-toree
View on GitHub
Mirror of Apache Toree (Incubating)
☆751Updated this week
amplab / ml-matrix
View on GitHub
Distributed Matrix Library
☆73Jan 28, 2017Updated 9 years ago
scalanlp / nak
View on GitHub
The Nak Machine Learning Library
☆342Jul 18, 2017Updated 9 years ago
huawei-noah / streamDM
View on GitHub
Stream Data Mining Library for Spark Streaming
☆497Apr 16, 2023Updated 3 years ago
twitter / chill
View on GitHub
Scala extensions for the Kryo serialization library
☆618Aug 19, 2024Updated last year
twitter / scalding
View on GitHub
A Scala API for Cascading
☆3,522May 28, 2023Updated 3 years ago
springnz / sparkplug
View on GitHub
A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.
☆47Aug 1, 2016Updated 9 years ago
saddle / saddle
View on GitHub
SADDLE: Scala Data Library
☆508Mar 21, 2020Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
collectivemedia / modelmatrix
View on GitHub
Sparse feature extraction with Spark
☆30Jul 25, 2018Updated 7 years ago
Stratio / sparta
View on GitHub
Real Time Analytics and Data Pipelines based on Spark Streaming
☆530Oct 24, 2019Updated 6 years ago
dirkneumann / deepdist
View on GitHub
Distributed Deep Learning on Spark
☆403Oct 8, 2016Updated 9 years ago
sameeragarwal / blinkdb
View on GitHub
BlinkDB: Sub-Second Approximate Queries on Very Large Data.
☆660Feb 6, 2014Updated 12 years ago
adobe-research / spark-cluster-deployment
View on GitHub
Automates Spark standalone cluster tasks with Puppet and Fabric.
☆43Aug 14, 2014Updated 11 years ago
airbnb / aerosolve
View on GitHub
A machine learning package built for humans.
☆4,809Nov 6, 2025Updated 8 months ago
massie / spark-parquet-example
View on GitHub
Example project to show how to use Spark to read and write Avro/Parquet files
☆50Aug 21, 2013Updated 12 years ago