matpalm / resemblance
trying shingling / resemblance / simhash / sketching to do some data deduping
☆98Updated 9 years ago
Related projects: ⓘ
- A high performance distributed graph database.☆130Updated 5 years ago
- Pretty fast parser for probabilistic context free grammars☆86Updated 11 years ago
- Various implementations of the forget table: a distributional database that forgets data☆201Updated 9 years ago
- Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection.☆108Updated 11 years ago
- Evaluate any text against a collection of match rules☆145Updated 10 years ago
- Easy Map/Reduce with Hadoop and Ruby. Also see http://github.com/forward/mandy-lab for examples.☆45Updated 12 years ago
- Redis bulk-loader for Apache Pig☆40Updated 12 years ago
- Jeremy's Machine Learning Library☆52Updated 8 years ago
- ZDevice is a Ruby DSL for assembling ZeroMQ routing devices, with support for the ZDCF configuration syntax☆42Updated 3 years ago
- ActiveColumn is a data management framework for Cassandra. It includes data migrations similar to ActiveRecord, and a data mapping frame…☆54Updated 6 months ago
- playing around with the common crawl dataset☆70Updated 12 years ago
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆23Updated 15 years ago
- My original graph database DSL machine☆176Updated 3 years ago
- finger GitHub users☆34Updated 12 years ago
- TweeQL is a Query Language for Tweets: SELECT brand(text) AS brand, sentiment(text) AS sentiment FROM twitter_sample;☆193Updated 10 years ago
- Full text search with any type of class or data store using Redis☆163Updated 14 years ago
- Ruby interface to Hadoop's HDFS via Thrift☆50Updated 10 years ago
- News Aggregator that classifies and clusterifies news from different sources☆46Updated 12 years ago
- IMPORTANT: Data Brewery is now Bubbles: https://github.com/stiivi/bubbles This brewery repository is NOT MAINTAINED any more.☆134Updated 11 years ago
- A heap inspector for live memcached instances.☆105Updated 12 years ago
- Collectd Plugin for Librato Metrics☆38Updated 8 years ago
- An unsupervised language identification algorithm in Ruby, built originally for detecting English-language tweets.☆39Updated 13 years ago
- Experimental Swiss Army Knife of Network Concurrency, ZeroMQ, EventMachine, WebSockets, HTTP, and More☆144Updated 7 years ago
- Bulk loading for elastic search☆186Updated 9 months ago
- Example application using neo4j.rb☆45Updated 11 years ago
- Bayesian classifier on top of Redis☆63Updated 12 years ago
- Neo4jr-Social is a self contained HTTP REST + JSON interface to the graph database Neo4j. Neo4jr-Social supports simple dynamic node crea…☆166Updated 14 years ago
- ☆23Updated this week
- Experimental bridge between RabbitMQ and Redis implemented as a RabbitMQ plugin☆33Updated 13 years ago
- ☆116Updated 12 years ago