vilda / shashLinks
Similarity hashing
☆49Updated 13 years ago
Alternatives and similar repositories for shash
Users that are interested in shash are comparing it to the libraries listed below
Sorting:
- A high performance search engine☆106Updated 8 years ago
- Clone version of LingPipe 4.1.0, with support for unsupervised training☆32Updated 11 years ago
- A framework for building reranking models.☆28Updated 10 years ago
- google all pairs similarity search package, with swig bindings☆22Updated 10 years ago
- A collection of generic, C++ Bloom Filter classes developed for the Boost C++ Libraries.☆24Updated 8 years ago
- A Java implementation of a Double Array Trie☆122Updated 14 years ago
- Simhashing in C++☆132Updated 2 years ago
- Learning M-Way Tree - Web Scale Clustering - EM-tree, K-tree, k-means, TSVQ, repeated k-means, bitwise clustering☆74Updated 3 years ago
- mmseg 分词算法c++实现☆33Updated 9 years ago
- 📑 SQLite extension to add the Okapi BM25 ranking algorithm☆35Updated 9 years ago
- Detecting near duplicates usign Moses Charikars Algorithm☆20Updated 10 years ago
- MMSEG simple word segmenter in C++ 11☆17Updated 10 years ago
- Feed-forward Bloom filters☆52Updated 14 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Updated 8 years ago
- Stand-alone recommender system from Myrrix☆108Updated last year
- An Implementation of Two-Trie and Tail-Trie using Double Array☆21Updated 12 years ago
- A light weight, low level embedded key-value database library☆32Updated 11 years ago
- ☆21Updated 11 years ago
- Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search☆168Updated 10 years ago
- Tools to evaluate accuracies of various (research papers') metadata extraction libraries☆11Updated 9 years ago
- Static Double Array Trie (DASTrie)☆31Updated 13 years ago
- C library for efficient string matching with Aho-Corasick☆21Updated 13 years ago
- Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strin…☆129Updated 11 years ago
- Implementation of Alexander A. Stepanov inverted Index Compression algorithms☆21Updated 9 years ago
- A command line tool for bulk geolocation queries written in C++.☆58Updated last year
- 搜狗输入法细胞词库解析☆15Updated 11 years ago
- A Java library capable of constructing character-sequence-storing, directed acyclic graphs of minimal size☆43Updated 12 years ago
- A tool for semantic relation extraction. The program finds pairs of semantically related words based on the text definitions coming from …☆26Updated 10 years ago
- A fast and comprehensive Java library capable of performing automaton and non-automaton based Levenshtein distance determination and neig…☆42Updated 12 years ago
- Recursively scans HTML pages for URLs and downloads desired content.☆12Updated 8 years ago