jmhodges / minhashLinks
An implementation of the MinHash algorithm in ruby using Murmur Hash
☆25Updated 16 years ago
Alternatives and similar repositories for minhash
Users that are interested in minhash are comparing it to the libraries listed below
Sorting:
- A high performance distributed graph database.☆131Updated 6 years ago
- trying shingling / resemblance / simhash / sketching to do some data deduping☆98Updated 10 years ago
- An open source information retrieval system written in C++11 and Python. Aspires to be an alternative to Nutch / Lucene. It uses MongoDB …☆87Updated 2 years ago
- ☆43Updated 12 years ago
- A repository of non-native, useful redis commands, scripted in lua.☆61Updated 14 years ago
- Solr for Astrophysics Data System☆55Updated 3 weeks ago
- C++ utility library☆24Updated 11 years ago
- Realtime Analytics☆68Updated 12 years ago
- Redis bulk-loader for Apache Pig☆40Updated 13 years ago
- A REST API for Mozilla Metrics services.☆57Updated 6 years ago
- Round robin database pattern via Redis sorted sets☆79Updated 15 years ago
- Ruby/JRuby bloom filters for bounded and unbounded (streaming) data, FNV hashing and bit fields☆106Updated 2 years ago
- TweeQL is a Query Language for Tweets: SELECT brand(text) AS brand, sentiment(text) AS sentiment FROM twitter_sample;☆194Updated 11 years ago
- The reference implementation of the SPEAR ranking algorithm in Python.☆37Updated 9 years ago
- KEA 5.0 (keyphrase extraction software), modified to be an XML-RPC service☆42Updated 14 years ago
- Bayesian classifier on top of Redis☆62Updated 13 years ago
- A restful web application for real-time typeahead and autocomplete☆105Updated 12 years ago
- distributed realtime searchable database☆117Updated 11 years ago
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 3 years ago
- A minimalist realtime full-text search index☆153Updated 13 years ago
- Bulk loading for elastic search☆185Updated last year
- A distributed task queue worker designed for throughput, parallelism, and clustering.☆238Updated 2 years ago
- syslog module for nginx☆18Updated 15 years ago
- Utilities for a widely dispersed replicated Redis cluster☆52Updated 15 years ago
- an elasticsearch plugin that allows to update a specify fileds of a document,avoid full reindex and reduce traffic costs☆40Updated 11 years ago
- Convert URL's to a normalized unicode format☆67Updated 7 years ago
- A/B test analysis library for Ruby - performs Chi-Square tests and G-tests on A/B results☆40Updated 11 years ago
- Flexibly analyze text for profanity, racial slurs, and sexual words.☆18Updated 14 years ago
- A very memory-efficient trie (radix tree) implementation☆47Updated 13 years ago
- Experimental bridge between RabbitMQ and Redis implemented as a RabbitMQ plugin☆32Updated 14 years ago