matpalm / resemblanceLinks
trying shingling / resemblance / simhash / sketching to do some data deduping
☆97Updated 10 years ago
Alternatives and similar repositories for resemblance
Users that are interested in resemblance are comparing it to the libraries listed below
Sorting:
- ☆43Updated 12 years ago
- Pretty fast parser for probabilistic context free grammars☆87Updated 12 years ago
- A high performance distributed graph database.☆131Updated 6 years ago
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆25Updated 16 years ago
- My original graph database DSL machine☆176Updated 4 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 8 years ago
- Ruby client library for controlling Google Refine☆44Updated 7 years ago
- TweeQL is a Query Language for Tweets: SELECT brand(text) AS brand, sentiment(text) AS sentiment FROM twitter_sample;☆194Updated 11 years ago
- Evaluate any text against a collection of match rules☆144Updated 11 years ago
- Jeremy's Machine Learning Library☆52Updated 9 years ago
- News Aggregator that classifies and clusterifies news from different sources☆46Updated 13 years ago
- An implementation of the HyperLogLog algorithm backed by Redis☆171Updated 10 years ago
- Grooveshark.com unofficial API library☆123Updated 10 years ago
- Modular Street Address Geocoder☆396Updated 13 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆52Updated 8 years ago
- A minimalist realtime full-text search index☆153Updated 13 years ago
- A Seriously Fun guide to Big Data Analytics in Practice☆169Updated 10 years ago
- Ruby on Hadoop: Efficient, effective Hadoop streaming & bulk data processing. Write micro scripts for terabyte-scale data☆496Updated 11 years ago
- Example application using neo4j.rb☆45Updated 12 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- Github upload of Bill McNeal's ruby wrapper for Stanford's NLP library☆87Updated 15 years ago
- Fast and intuitive exploratory data analysis☆97Updated 10 years ago
- Ranked Prefix Search for Large Data on External Memory optimized for Mobile with ZERO lag initialization time☆17Updated 7 years ago
- Bayesian classifier on top of Redis☆62Updated 13 years ago
- Mneme is an HTTP web-service for recording and identifying previously seen records - aka, duplicate detection.☆108Updated 12 years ago
- Examples for Accessing Jena API through JRuby☆17Updated 14 years ago
- C++ implementation of hamming distance algorithm HmSearch using Kyoto Cabinet☆42Updated 9 years ago
- Ruby library for Redis backed time series.☆193Updated 10 years ago
- No longer supported see - https://github.com/meh/ruby-tesseract-ocr☆38Updated 12 years ago
- An unsupervised language identification algorithm in Ruby, built originally for detecting English-language tweets.☆39Updated 14 years ago