matpalm / resemblanceLinks
trying shingling / resemblance / simhash / sketching to do some data deduping
☆99Updated 9 years ago
Alternatives and similar repositories for resemblance
Users that are interested in resemblance are comparing it to the libraries listed below
Sorting:
- Pretty fast parser for probabilistic context free grammars☆87Updated 12 years ago
- A high performance distributed graph database.☆131Updated 6 years ago
- Bulk loading for elastic search☆184Updated last year
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆25Updated 16 years ago
- Evaluate any text against a collection of match rules☆143Updated 11 years ago
- Various implementations of the forget table: a distributional database that forgets data☆200Updated 10 years ago
- Bayesian classifier on top of Redis☆62Updated 13 years ago
- Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection.☆108Updated 11 years ago
- An implementation of the HyperLogLog algorithm backed by Redis☆171Updated 9 years ago
- Ruby client library for controlling Google Refine☆44Updated 7 years ago
- Amazon's elastic mapreduce ruby client. Ruby 1.9.X compatible☆84Updated 10 years ago
- A repository of non-native, useful redis commands, scripted in lua.☆61Updated 13 years ago
- My original graph database DSL machine☆176Updated 4 years ago
- ☆116Updated 13 years ago
- The reference implementation of the SPEAR ranking algorithm in Python.☆37Updated 9 years ago
- Neo4jr-Social is a self contained HTTP REST + JSON interface to the graph database Neo4j. Neo4jr-Social supports simple dynamic node crea…☆166Updated 14 years ago
- ActiveColumn is a data management framework for Cassandra. It includes data migrations similar to ActiveRecord, and a data mapping frame…☆53Updated last year
- Ruby on Hadoop: Efficient, effective Hadoop streaming & bulk data processing. Write micro scripts for terabyte-scale data☆497Updated 10 years ago
- A minimalist realtime full-text search index☆152Updated 12 years ago
- Full text search with any type of class or data store using Redis☆163Updated 14 years ago
- Python wrapper for the Vowpal Wabbit machine learning library.☆53Updated 11 years ago
- Postgres insights made easy☆156Updated 5 years ago
- Modular Street Address Geocoder☆395Updated 12 years ago
- Jeremy's Machine Learning Library☆52Updated 9 years ago
- Gitan is a very basic web interface to create and inspect bare git repositories☆45Updated 14 years ago
- News Aggregator that classifies and clusterifies news from different sources☆46Updated 13 years ago
- Zero-downtime table migrations in MySQL☆230Updated 11 years ago
- Permutation library in Ruby☆29Updated 11 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- A Redis client in Bash☆14Updated 13 years ago