Production-ready K-Means clustering for Apache Spark with pluggable Bregman divergences (KL, Itakura-Saito, L1, etc). 6 algorithms, 740 tests, cross-version persistence. Drop-in replacement for MLlib with mathematically correct distance functions for probability distributions, spectral data, and count data.
☆342Feb 14, 2026Updated 3 weeks ago
Alternatives and similar repositories for generalized-kmeans-clustering
Users that are interested in generalized-kmeans-clustering are comparing it to the libraries listed below
Sorting:
- A package full of linear algebra operators for Apache Spark MLlib's linalg package☆10Sep 9, 2015Updated 10 years ago
- A versatile and powerful data platform allowing interactive searches, dashboards, alerts, and more.☆26Sep 12, 2025Updated 5 months ago
- An email segmentation system (reference implementation of ECIR 2018 paper)☆10Oct 21, 2019Updated 6 years ago
- Gust is a set of GPU extensions for Breeze.☆32Apr 10, 2015Updated 10 years ago
- Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark☆146Jan 26, 2016Updated 10 years ago
- C++ raytracer that supports custom models. Supports running the calculations on the CPU using C++11 threads or in the GPU via CUDA.☆74Dec 24, 2022Updated 3 years ago
- Automatically generated text for brainstorming/mindmapping purposes.☆25Jul 15, 2023Updated 2 years ago
- Fast similarity search using DuckDB☆146Oct 30, 2024Updated last year
- Visualize streaming machine learning in Spark☆177Jun 29, 2017Updated 8 years ago
- ☆722Aug 15, 2025Updated 6 months ago
- SBT Plugins for AI2 projects☆24Oct 14, 2022Updated 3 years ago
- High-Performance Klong array language in Python.☆314Updated this week
- Real Time Analytics and Data Pipelines based on Spark Streaming☆531Oct 24, 2019Updated 6 years ago
- Benchmarks of artificial neural network library for Spark MLlib☆11Dec 3, 2015Updated 10 years ago
- Text Mining Library with a focus on Latent Semantic Analysis☆12Aug 4, 2013Updated 12 years ago
- A Neural network implementation with Scala☆20Jul 17, 2016Updated 9 years ago
- Algebraic enhancements for GEMM & AI accelerators☆288Feb 28, 2025Updated last year
- R.L. methods and techniques.☆199Feb 28, 2026Updated last week
- Experiments with applying Fourier transofrms to various plane-filling curves and patterns☆65Apr 17, 2023Updated 2 years ago
- A BERT that you can train on a (gaming) laptop.☆209Sep 8, 2023Updated 2 years ago
- distinct counters and aggregate functions for distinct estimation in PostgreSQL☆17Nov 3, 2019Updated 6 years ago
- A reference implementation for a weave data structure to allow quick reconstruction of old versions of a compressed repository in version…☆17Jan 2, 2016Updated 10 years ago
- A reimplementation of https://github.com/otiai10/gosseract without CGo, running Tesseract compiled to WASM with Wazero☆156Jun 23, 2025Updated 8 months ago
- DBSCAN clustering algorithm on top of Apache Spark☆264Mar 28, 2018Updated 7 years ago
- Splash Project for parallel stochastic learning☆93Jun 16, 2017Updated 8 years ago
- Deriving Spark DataFrame schemas from case classes☆44Jun 24, 2024Updated last year
- A playground to make it easy to try crazy things☆33Feb 13, 2026Updated 3 weeks ago
- Animating R1's thoughts.☆382Feb 17, 2025Updated last year
- Node.js kafka connect connector for prometheus☆13Dec 7, 2022Updated 3 years ago
- Recovering Anthony Bourdain's lost li.st entries.☆26Jan 8, 2026Updated 2 months ago
- Simple implementation of KMeans clustering on Flink, using iterations☆10Nov 15, 2018Updated 7 years ago
- Word2Vec - Google's word2vec in Scala using UMASS factorie library for better hacking and research.☆16Apr 7, 2014Updated 11 years ago
- Distributed lbfgs on Apache Spark☆11Sep 25, 2020Updated 5 years ago
- Self-hostable headless QR code generator☆18Feb 5, 2026Updated last month
- A scientific instrument for investigating latent spaces☆752Nov 17, 2025Updated 3 months ago
- Quick summary: This code implements a spectral (third order tensor decomposition) learning method for learning LDA topic model on Spark.☆104Jul 2, 2018Updated 7 years ago
- Statistics utilities for the JVM - in Scala!☆92Nov 13, 2017Updated 8 years ago
- Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets …☆1,114Updated this week
- ☆53Aug 21, 2025Updated 6 months ago