derrickburns / generalized-kmeans-clusteringLinks
Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.
☆304Updated this week
Alternatives and similar repositories for generalized-kmeans-clustering
Users that are interested in generalized-kmeans-clustering are comparing it to the libraries listed below
Sorting:
- ☆707Updated 2 months ago
- A Detailed Introduction to My Favorite Statistical Measure, Hoeffding's D☆100Updated last year
- Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).☆253Updated last year
- Simplifying robust end-to-end machine learning on Apache Spark.☆474Updated 8 years ago
- A Kurtosis package for Python data engineers, deploying a Jupyter notebook along with a configurable set of databases, and a visualizatio…☆109Updated last year
- Automated, smooth, N'th order derivatives of non-uniformly sampled time series data☆228Updated last year
- Visualize text embeddings☆40Updated 2 years ago
- convert a scikit-learn decision tree into a Keras model☆39Updated last year
- Generate Cool-Looking Mazes and Animations Illustrating the A* Pathfinding Algorithm☆177Updated 7 months ago
- Wayeb is a Complex Event Processing and Forecasting (CEP/F) engine written in Scala.☆151Updated last year
- The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.☆404Updated 7 months ago
- Lossy Counting and Sticky Sampling implementation for efficient frequency counts on data streams.☆63Updated 9 years ago
- R.L. methods and techniques.☆200Updated 11 months ago
- Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few l…☆286Updated last month
- Docker-based inference engine for AMD GPUs☆230Updated last year
- A BERT that you can train on a (gaming) laptop.☆207Updated 2 years ago
- Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipeli…☆644Updated last week
- Mapping the French Culinary Universe☆48Updated 7 months ago
- A copy of ONNX models, datasets, and code all in one GitHub repository. Follow the README to learn more.☆104Updated last year
- BigTable, Document and Graph Database with Full Text Search☆186Updated 7 years ago
- Optimally allocate poker chips using constrained, nonlinear optimization☆174Updated 10 months ago
- Run and explore Llama models locally with minimal dependencies on CPU☆189Updated last year
- Tools for working with parquet, impala, and hive☆134Updated 4 years ago
- Algebraic enhancements for GEMM & AI accelerators☆279Updated 7 months ago
- Lamport's Bakery Algorithm Demonstrated in Python☆95Updated last year
- Arc is an opinionated framework for defining data pipelines which are predictable, repeatable and manageable.☆169Updated last year
- Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems☆105Updated 6 months ago
- Revealing example of self-attention, the building block of transformer AI models☆130Updated 2 years ago
- ☆155Updated 11 months ago
- Conversion of Jupyter and Zeppelin notebooks to Jupyter or Markdown formats☆27Updated 5 years ago