matpalm / resemblance
trying shingling / resemblance / simhash / sketching to do some data deduping
☆98Updated 9 years ago
Alternatives and similar repositories for resemblance:
Users that are interested in resemblance are comparing it to the libraries listed below
- Pretty fast parser for probabilistic context free grammars☆87Updated 11 years ago
- A high performance distributed graph database.☆130Updated 5 years ago
- Jeremy's Machine Learning Library☆52Updated 8 years ago
- Various implementations of the forget table: a distributional database that forgets data☆200Updated 10 years ago
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆24Updated 15 years ago
- Bayesian classifier on top of Redis☆63Updated 12 years ago
- Redis bulk-loader for Apache Pig☆40Updated 12 years ago
- My original graph database DSL machine☆176Updated 3 years ago
- Evaluate any text against a collection of match rules☆143Updated 11 years ago
- ☆116Updated 12 years ago
- An extension to PostgreSQL allowing Kyoto Cabinets to be used as a backing data store.☆54Updated 4 months ago
- Bulk loading for elastic search☆185Updated last year
- Easy Map/Reduce with Hadoop and Ruby. Also see http://github.com/forward/mandy-lab for examples.☆45Updated 13 years ago
- Example application using neo4j.rb☆45Updated 12 years ago
- Ruby interface to Hadoop's HDFS via Thrift☆50Updated 11 years ago
- News Aggregator that classifies and clusterifies news from different sources☆46Updated 13 years ago
- Script to hop to common directories and servers☆112Updated 11 years ago
- The reference implementation of the SPEAR ranking algorithm in Python.☆37Updated 9 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection.☆108Updated 11 years ago
- Dynamic Visualization LEGO☆129Updated 5 months ago
- Latent Dirichlet Allocation for topic modeling of streamed data sources☆102Updated 9 years ago
- Full text search with any type of class or data store using Redis☆163Updated 14 years ago
- ActiveColumn is a data management framework for Cassandra. It includes data migrations similar to ActiveRecord, and a data mapping frame…☆53Updated 10 months ago
- ☆43Updated 11 years ago
- A minimalist realtime full-text search index☆152Updated 12 years ago
- A Redis-backed statistics storage and querying library written in Ruby.☆155Updated 6 years ago
- Create and install double-tab (‘tab tab’) auto-completions for any command-line application on any shell (bash, fish, ksh, etc)☆91Updated 13 years ago
- An implementation of the HyperLogLog algorithm backed by Redis☆172Updated 9 years ago
- Experimental bridge between RabbitMQ and Redis implemented as a RabbitMQ plugin☆32Updated 13 years ago