rahularora / MinHashLinks
Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)
☆30Updated 12 years ago
Alternatives and similar repositories for MinHash
Users that are interested in MinHash are comparing it to the libraries listed below
Sorting:
- LSH based high dimensional clustering for sets and points☆80Updated 11 years ago
- A fast Python implementation of locality sensitive hashing.☆71Updated 10 years ago
- Probabilistic Data Structures in Python (originally presented at PyData 2013)☆55Updated 3 years ago
- Example Python code for comparing documents using MinHash☆252Updated 6 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆148Updated last year
- A pure python implementation of locality sensitive hashing for text documents☆87Updated 10 years ago
- Tools, wrappers, etc... for data science with a concentration on text processing☆207Updated 3 years ago
- It is a forest of random projection trees☆224Updated 5 years ago
- Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neig…☆98Updated 10 years ago
- A Generalized Data Cleaning System☆50Updated 9 years ago
- Refinery - A locally deployable open-source web platform for analysis of large document collections☆101Updated 9 years ago
- Recommender System Framework☆126Updated 9 years ago
- FluRS: A Python library for streaming recommendation algorithms☆109Updated 3 years ago
- Solution to Facebook's link prediction contest on Kaggle.☆206Updated 13 years ago
- Topological Anomaly Detection (TAD) per Gartley and Basener 2009☆68Updated 5 years ago
- POC IDS anomaly detection engine built with iPython notebook, matplotlib, pandas, numpy, scikit-learn, d3.js, hyperloglog implementation,…☆79Updated 11 years ago
- CrowdRec reference framework☆32Updated 9 years ago
- lightweight python wrapper for vowpal wabbit☆174Updated 5 years ago
- ☆92Updated 10 years ago
- Some add-on modules to networkx library☆78Updated 5 years ago
- A Python library for learning from dimensionality reduction, supporting sparse and dense matrices.☆78Updated 8 years ago
- Instructions & code for the EuroPython 2014 training session "Topic Modeling for Fun and Profit"☆110Updated 11 years ago
- Word2Vec models with Twitter data using Spark. Blog:☆66Updated 6 years ago
- A curated inventory of machine learning methods available on the Apache Spark platform, both in official and third party libraries.☆65Updated 8 years ago
- Natural Language Processing with Spark's MLlib☆63Updated 8 years ago
- [NO LONGER MAINTAINED AS OPEN SOURCE - USE SCALETEXT.COM INSTEAD]☆107Updated 12 years ago
- A library that allows serialization of SciKit-Learn estimators into PMML☆72Updated 6 years ago
- ☆35Updated 12 years ago
- Predicting closed questions on Stack Overflow☆44Updated 8 years ago
- the 2nd place solution for West Nile Virus Prediction challenge on Kaggle☆36Updated 10 years ago