vilda / shashLinks
Similarity hashing
☆49Updated 14 years ago
Alternatives and similar repositories for shash
Users that are interested in shash are comparing it to the libraries listed below
Sorting:
- A high performance search engine☆107Updated 9 years ago
- Simhashing in C++☆136Updated 2 years ago
- 📑 SQLite extension to add the Okapi BM25 ranking algorithm☆36Updated 10 years ago
- A crawler, indexer, and query interface all in Python with distributed processing via Pyro4.☆23Updated 13 years ago
- Python Finite State Machine implementation with a pygraphviz hook☆21Updated 6 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Updated 8 years ago
- A cluster implementation of simhash near-duplicate detection☆32Updated 10 years ago
- Detecting near duplicates usign Moses Charikars Algorithm☆20Updated 11 years ago
- Contains the implementation of algorithms that estimate the geographic location of media content based on their content and metadata. It …☆15Updated 9 years ago
- Attempts to determine the natural language of a selection of Unicode (utf-8) text (a clone of http://code.google.com/p/guess-language wit…☆48Updated 15 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- Suite of tools for detecting changes in web pages and their rendering☆55Updated 2 years ago
- C language port of google-diff-match-patch library☆41Updated 9 years ago
- epoll demo☆13Updated 8 years ago
- A simple bloom filter for SQLite using Murmur3☆18Updated 14 years ago
- simple simhashing in hadoop with cascading☆33Updated 14 years ago
- iCQA - Intelligent Community Question Answering Framework☆31Updated 9 years ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json dump. Questions? https://gitter.im/idio-opensource/Lobby☆17Updated 3 years ago
- A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/…☆131Updated 2 years ago
- Learning M-Way Tree - Web Scale Clustering - EM-tree, K-tree, k-means, TSVQ, repeated k-means, bitwise clustering☆78Updated 4 years ago
- ☆20Updated 8 years ago
- simple inverted index full text search engine written in python☆13Updated 12 years ago
- Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strin…☆129Updated 11 years ago
- A dynamic programming toolkit.☆39Updated 11 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆64Updated 5 months ago
- Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search☆170Updated 10 years ago
- Convert URL's to a normalized unicode format☆67Updated 8 years ago
- a tiny nosql database supporting pluggable storage engine.☆40Updated 8 years ago
- N-grams approximate string matching implementation in pure Python☆26Updated 15 years ago