matpalm / resemblance
trying shingling / resemblance / simhash / sketching to do some data deduping
☆98Updated 9 years ago
Alternatives and similar repositories for resemblance:
Users that are interested in resemblance are comparing it to the libraries listed below
- Pretty fast parser for probabilistic context free grammars☆87Updated 11 years ago
- A high performance distributed graph database.☆130Updated 6 years ago
- An implementation of the HyperLogLog algorithm backed by Redis☆172Updated 9 years ago
- Evaluate any text against a collection of match rules☆143Updated 11 years ago
- Full text search with any type of class or data store using Redis☆163Updated 14 years ago
- Jeremy's Machine Learning Library☆52Updated 9 years ago
- Neo4jr-Social is a self contained HTTP REST + JSON interface to the graph database Neo4j. Neo4jr-Social supports simple dynamic node crea…☆166Updated 14 years ago
- My original graph database DSL machine☆176Updated 4 years ago
- Various implementations of the forget table: a distributional database that forgets data☆200Updated 10 years ago
- ☆43Updated 11 years ago
- Example application using neo4j.rb☆45Updated 12 years ago
- Reduce your data. A unix filter for algebird-powered aggregation.☆138Updated 7 years ago
- A minimalist realtime full-text search index☆152Updated 12 years ago
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆24Updated 15 years ago
- Ruby client library for controlling Google Refine☆44Updated 6 years ago
- News Aggregator that classifies and clusterifies news from different sources☆46Updated 13 years ago
- Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection.☆108Updated 11 years ago
- An experiment with stats, the Ruby way☆42Updated 7 years ago
- ZDevice is a Ruby DSL for assembling ZeroMQ routing devices, with support for the ZDCF configuration syntax☆42Updated 4 years ago
- Ranked Prefix Search for Large Data on External Memory optimized for Mobile with ZERO lag initialization time☆16Updated 6 years ago
- A document vector search with flexible matrix transforms. Currently supports Latent semantic analysis and Term frequency - inverse docume…☆150Updated 4 years ago
- Ruby interface to Hadoop's HDFS via Thrift☆50Updated 11 years ago
- Bulk loading for elastic search☆185Updated last year
- Dynamic Visualization LEGO☆129Updated 6 months ago
- Snappy, a fast compressor/decompressor (courtesy of Google)☆46Updated 13 years ago
- Amazon's elastic mapreduce ruby client. Ruby 1.9.X compatible☆84Updated 10 years ago
- TweeQL is a Query Language for Tweets: SELECT brand(text) AS brand, sentiment(text) AS sentiment FROM twitter_sample;☆193Updated 10 years ago
- ☆116Updated 13 years ago
- Easy Map/Reduce with Hadoop and Ruby. Also see http://github.com/forward/mandy-lab for examples.☆45Updated 13 years ago
- Ferret: the extensible information retrieval library for ruby.☆278Updated 2 years ago