matpalm / resemblanceLinks
trying shingling / resemblance / simhash / sketching to do some data deduping
☆98Updated 10 years ago
Alternatives and similar repositories for resemblance
Users that are interested in resemblance are comparing it to the libraries listed below
Sorting:
- Pretty fast parser for probabilistic context free grammars☆87Updated 12 years ago
- A high performance distributed graph database.☆131Updated 6 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 8 years ago
- My original graph database DSL machine☆176Updated 4 years ago
- Ruby client library for controlling Google Refine☆44Updated 7 years ago
- ☆43Updated 12 years ago
- Evaluate any text against a collection of match rules☆144Updated 11 years ago
- News Aggregator that classifies and clusterifies news from different sources☆46Updated 13 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- An implementation of the HyperLogLog algorithm backed by Redis☆171Updated 9 years ago
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆25Updated 16 years ago
- An unsupervised language identification algorithm in Ruby, built originally for detecting English-language tweets.☆39Updated 14 years ago
- Github upload of Bill McNeal's ruby wrapper for Stanford's NLP library☆87Updated 15 years ago
- A minimalist realtime full-text search index☆152Updated 13 years ago
- Bayesian classifier on top of Redis☆62Updated 13 years ago
- Analysis of Github Commits Comments☆36Updated 8 years ago
- Neo4jr-Social is a self contained HTTP REST + JSON interface to the graph database Neo4j. Neo4jr-Social supports simple dynamic node crea…☆167Updated 15 years ago
- postgres 9.2 visibility☆257Updated 11 years ago
- Jeremy's Machine Learning Library☆52Updated 9 years ago
- Social sentiment flagger intended to judge given text as: positive, neutral or negative.☆130Updated 13 years ago
- Ruby Linear Algebra Library☆108Updated 16 years ago
- Stream-based InputFormat for processing the compressed XML dumps of Wikipedia with Hadoop☆85Updated 12 years ago
- A prototype for a pure ruby plugin☆22Updated 13 years ago
- Redis bulk-loader for Apache Pig☆40Updated 13 years ago
- ☆116Updated 13 years ago
- Examples for Accessing Jena API through JRuby☆17Updated 14 years ago
- A set of command-line statistics tools☆29Updated 10 years ago
- Gitan is a very basic web interface to create and inspect bare git repositories☆45Updated 14 years ago
- Modular Street Address Geocoder☆395Updated 13 years ago
- Full text search with any type of class or data store using Redis☆162Updated 15 years ago