rahularora / MinHashLinks
Estimating how similar are two sets using MinHash (Jaccard similarity coefficient)
☆31Updated 12 years ago
Alternatives and similar repositories for MinHash
Users that are interested in MinHash are comparing it to the libraries listed below
Sorting:
- POC IDS anomaly detection engine built with iPython notebook, matplotlib, pandas, numpy, scikit-learn, d3.js, hyperloglog implementation,…☆79Updated 10 years ago
- A fast Python implementation of locality sensitive hashing.☆70Updated 10 years ago
- This project contains the code to translate between Apache Spark and SFrame.☆20Updated 8 years ago
- LSH based high dimensional clustering for sets and points☆79Updated 10 years ago
- A pure python implementation of locality sensitive hashing for text documents☆85Updated 9 years ago
- My capstone project for Galvanize (Zipfian Academy)☆38Updated 6 years ago
- Natural Language Processing with Spark's MLlib☆62Updated 7 years ago
- Probabilistic Data Structures in Python (originally presented at PyData 2013)☆55Updated 3 years ago
- ☆75Updated 12 years ago
- ☆35Updated 11 years ago
- feng - feature engineering for machine-learning champions☆27Updated 8 years ago
- Simple example on how to use Naive Bayes on Spark using the popular Reuters 21578 dataset☆23Updated 10 years ago
- ☆26Updated 8 years ago
- Preparing DMOZ dataset for my n-Gram LM-based URL classification research☆32Updated 10 years ago
- Implementation of unsupervised feature selection algorithm proposed by [Huang, et al. 2015]☆10Updated 9 years ago
- CrowdRec reference framework☆32Updated 8 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- PySpark for Elastic Search☆55Updated 8 years ago
- ☆24Updated 7 years ago
- 🍊 Add-on for Orange3 to support recommender systems.☆24Updated 5 years ago
- Anomaly Detection model uses Spark for training and Spark Streaming for testing☆67Updated 9 years ago
- Pydata NYC 2014 Scikit Learn Tutorial☆65Updated 10 years ago
- Recommender System Framework☆125Updated 8 years ago
- An Apache Lucene TokenFilter that uses a word2vec vectors for term expansion.☆24Updated 11 years ago
- Predicting closed questions on Stack Overflow☆44Updated 7 years ago
- kdd cup 2013 track1 code☆29Updated 11 years ago
- Topological Anomaly Detection (TAD) per Gartley and Basener 2009☆69Updated 5 years ago
- Implementation of Bayesian Sets for fast similarity searches.☆14Updated 13 years ago
- Anomaly detection training suite☆119Updated 9 years ago
- Tool to visualize data quickly with no brain usage for plot creation☆46Updated 6 years ago