rahularora / MinHash
Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)
☆30Updated 11 years ago
Alternatives and similar repositories for MinHash:
Users that are interested in MinHash are comparing it to the libraries listed below
- LSH based high dimensional clustering for sets and points☆78Updated 10 years ago
- A pure python implementation of locality sensitive hashing for text documents☆86Updated 9 years ago
- A fast Python implementation of locality sensitive hashing.☆70Updated 9 years ago
- POC IDS anomaly detection engine built with iPython notebook, matplotlib, pandas, numpy, scikit-learn, d3.js, hyperloglog implementation,…☆78Updated 10 years ago
- Implementation of Bayesian Sets for fast similarity searches.☆15Updated 13 years ago
- Probabilistic Data Structures in Python (originally presented at PyData 2013)☆55Updated 3 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- ☆26Updated 7 years ago
- ☆24Updated 6 years ago
- Vowpal Wabbit Webservice. A web service that accepts VW formatted text and runs it through a VW daemon instance.☆40Updated 8 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆21Updated 8 years ago
- FluRS: A Python library for streaming recommendation algorithms☆109Updated 2 years ago
- feng - feature engineering for machine-learning champions☆27Updated 7 years ago
- ☆38Updated 8 years ago
- Implementation of unsupervised feature selection algorithm proposed by [Huang, et al. 2015]☆10Updated 9 years ago
- Topological Anomaly Detection (TAD) per Gartley and Basener 2009☆69Updated 4 years ago
- lightweight python wrapper for vowpal wabbit☆166Updated 5 years ago
- Simple approximate-nearest-neighbours in Python using locality sensitive hashing.☆140Updated 12 years ago
- A pure Python implementation of Aho-Corasick algorithm.☆22Updated 6 years ago
- Demo code contrasting Google Dataflow (Apache Beam) with Apache Spark☆14Updated 8 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆146Updated 4 months ago
- Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neig…☆99Updated 9 years ago
- A library that allows serialization of SciKit-Learn estimators into PMML☆70Updated 5 years ago
- Wabbit Wappa is a full-featured Python wrapper for the Vowpal Wabbit machine learning utility.☆101Updated 7 years ago
- Code for generating analyses found in "Analyzing Log Analysis: An Empirical Study of User Log Mining" to appear in LISA 2014.☆8Updated 10 years ago
- An API for Distributed Machine Learning☆154Updated 8 years ago
- Natural Language Processing with Spark's MLlib☆62Updated 7 years ago
- A Python library for learning from dimensionality reduction, supporting sparse and dense matrices.☆78Updated 7 years ago
- ☆61Updated 8 years ago
- ☆35Updated 11 years ago