derrickburns/generalized-kmeans-clustering

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/derrickburns/generalized-kmeans-clustering)

derrickburns / generalized-kmeans-clustering

Production-ready K-Means clustering for Apache Spark with pluggable Bregman divergences (KL, Itakura-Saito, L1, etc). 6 algorithms, 740 tests, cross-version persistence. Drop-in replacement for MLlib with mathematically correct distance functions for probability distributions, spectral data, and count data.

☆342

Alternatives and similar repositories for generalized-kmeans-clustering

Users that are interested in generalized-kmeans-clustering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mrsqueeze / spark-hash
View on GitHub
Locality Sensitive Hashing for Apache Spark
☆198Nov 1, 2016Updated 9 years ago
ianoc / SparkEMRBootstrap
View on GitHub
Files to help make new spark EMR Bootstraps
☆15Aug 4, 2013Updated 12 years ago
ankurdave / kmeans-spark
View on GitHub
A simple implementation of k-means clustering on the Spark cluster computing framework. See http://cs.berkeley.edu/~matei/spark.
☆26Apr 9, 2011Updated 15 years ago
yahoo / SparkADMM
View on GitHub
Generic Implementation of Consensus ADMM over Spark
☆84Jul 8, 2016Updated 10 years ago
dlwh / gust
View on GitHub
Gust is a set of GPU extensions for Breeze.
☆32Apr 10, 2015Updated 11 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
avulanov / ann-benchmark
View on GitHub
Benchmarks of artificial neural network library for Spark MLlib
☆11Dec 3, 2015Updated 10 years ago
daithiocrualaoich / spark-emr
View on GitHub
Spark Elastic MapReduce bootstrap and runnable examples.
☆17Jun 26, 2013Updated 13 years ago
databricks / spark-tfocs
View on GitHub
A Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)
☆90Apr 15, 2024Updated 2 years ago
TimoKats / inspyr
View on GitHub
Automatically generated text for brainstorming/mindmapping purposes.
☆25Jul 15, 2023Updated 3 years ago
viirya / SparkAffinityPropagation
View on GitHub
Affinity Propagation on Spark
☆20May 31, 2021Updated 5 years ago
QiqiDuan257 / parallel-pso-spark
View on GitHub
Parallel Particle Swarm Optimizer on the Spark Clustering Computing Platform.
☆12Oct 29, 2018Updated 7 years ago
maxdotio / neural-solr
View on GitHub
Neural Solr = Solr 9 + Mighty Inference + Node
☆18Jun 9, 2022Updated 4 years ago
saurfang / sbt-spark-submit
View on GitHub
sbt plugin for spark-submit
☆96Nov 2, 2017Updated 8 years ago
patricktrainer / duckdb-embedding-search
View on GitHub
Fast similarity search using DuckDB
☆150Oct 30, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
thatipamula-jashwanth / smart-knn
View on GitHub
smartKNN - A feature-weighted KNN algorithm with automatic preprocessing, normalization, and learned feature importance.
☆33Mar 11, 2026Updated 4 months ago
maxilevi / raytracer
View on GitHub
C++ raytracer that supports custom models. Supports running the calculations on the CPU using C++11 threads or in the GPU via CUDA.
☆74Dec 24, 2022Updated 3 years ago
freeman-lab / spark-ml-streaming
View on GitHub
Visualize streaming machine learning in Spark
☆176Jun 29, 2017Updated 9 years ago
armandgrillet / stsc
View on GitHub
A implementation of the Self-Tuning Spectral Clustering algorithm, and more.
☆12Sep 4, 2016Updated 9 years ago
collectivemedia / spark-hyperloglog
View on GitHub
Interactive Audience Analytics with Spark and HyperLogLog
☆55Oct 14, 2015Updated 10 years ago
databricks / tensorframes
View on GitHub
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
☆744Jul 30, 2024Updated last year
google / graph-mining
View on GitHub
☆738Aug 15, 2025Updated 11 months ago
Stratio / sparta
View on GitHub
Real Time Analytics and Data Pipelines based on Spark Streaming
☆530Oct 24, 2019Updated 6 years ago
crmulliner / fluxnode
View on GitHub
FluxN0de
☆14Feb 22, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
intel-machine-learning / DistML
View on GitHub
DistML provide a supplement to mllib to support model-parallel on Spark
☆170Feb 6, 2017Updated 9 years ago
neospe / autofit2
View on GitHub
Automated end-to-end data preprocessing, model training, and evaluation pipeline
☆17Jun 3, 2026Updated last month
alitouka / spark_dbscan
View on GitHub
DBSCAN clustering algorithm on top of Apache Spark
☆264Mar 28, 2018Updated 8 years ago
jiayuasu / JTSplus
View on GitHub
JTS Topology Suite 1.14 with additional functions for GeoSpark
☆14Jan 5, 2018Updated 8 years ago
Dicklesworthstone / bakery_algorithm
View on GitHub
Lamport's Bakery Algorithm Demonstrated in Python
☆98Jan 19, 2024Updated 2 years ago
zenecture / neuroflow
View on GitHub
Artificial Neural Networks for Scala
☆112Mar 9, 2019Updated 7 years ago
fabuzaid21 / yggdrasil
View on GitHub
Yggdrasil: Faster Decision Trees Using Column Partitioning in Spark
☆30May 17, 2018Updated 8 years ago
elodina / syscol
View on GitHub
Collect local Mesos slave, underlying operating system and machine metrics and produce to Apache Kafka
☆20Jan 29, 2016Updated 10 years ago
mxmlnkn / fft-image-experiments
View on GitHub
Experiments with applying Fourier transofrms to various plane-filling curves and patterns
☆65Apr 17, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LiYvbo / SparkKmeans
View on GitHub
毕业设计源码-基于Spark的Kmeans聚类算法优化
☆18Jul 18, 2016Updated 10 years ago
HarendraKumarSingh / openNLP
View on GitHub
Quick starter guide for java based Natural Language Processing training, saving model, loading model and inference.
☆12Jul 9, 2018Updated 8 years ago
McIndi / delve
View on GitHub
A versatile and powerful data platform allowing interactive searches, dashboards, alerts, and more.
☆26Jul 7, 2026Updated 3 weeks ago
Nike-Inc / koheesio
View on GitHub
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipeli…
☆817Updated this week
twosigma / flint
View on GitHub
A Time Series Library for Apache Spark
☆1,173Jul 3, 2020Updated 6 years ago
BenFradet / struct-type-encoder
View on GitHub
Deriving Spark DataFrame schemas from case classes
☆44Jun 24, 2024Updated 2 years ago
thunder-project / thunder
View on GitHub
scalable analysis of images and time series
☆822Jan 6, 2017Updated 9 years ago