vilda / shash
Similarity hashing
☆48Updated 13 years ago
Related projects ⓘ
Alternatives and complementary repositories for shash
- Detecting near duplicates usign Moses Charikars Algorithm☆20Updated 10 years ago
- google all pairs similarity search package, with swig bindings☆23Updated 9 years ago
- Simhashing in C++☆134Updated last year
- A fast and comprehensive Java library capable of performing automaton and non-automaton based Levenshtein distance determination and neig…☆41Updated 11 years ago
- Online news article (HTML pages) context extraction using Maximum Subsequence Segmentation Algorithm as presented by Pasternack and Roth☆17Updated 7 years ago
- A tool for semantic relation extraction. The program finds pairs of semantically related words based on the text definitions coming from …☆28Updated 10 years ago
- A tool for calculation semantic similarity between words from a text corpus based on lexico-syntactic patterns.☆28Updated 8 years ago
- Forever incomplete suite of tools for an orthographic/grammatical checker☆28Updated 4 years ago
- A framework for building reranking models.☆29Updated 9 years ago
- A high performance search engine☆102Updated 7 years ago
- Library for Character/Word n-gram Analysis☆22Updated 7 years ago
- Lightning fast spell correction / fuzzy search library based on SymSpell by Commerce-Experts☆80Updated 6 years ago
- Java text categorization system☆54Updated 7 years ago
- Using latent Dirichlet allocation (LDA) in Apache Lucene☆58Updated 11 years ago
- High-performance Concurrent Cuckoo Hashing Library☆46Updated 9 years ago
- 📑 SQLite extension to add the Okapi BM25 ranking algorithm☆35Updated 9 years ago
- An Implementation of Two-Trie and Tail-Trie using Double Array☆21Updated 11 years ago
- Learning M-Way Tree - Web Scale Clustering - EM-tree, K-tree, k-means, TSVQ, repeated k-means, bitwise clustering☆75Updated 2 years ago
- This provides tools for b-bit MinHash algorism.☆35Updated 10 months ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- Stanford Tregex-inspired language for rule-based dependency tree manipulation.☆21Updated 7 years ago
- SALM: Suffix Array and its Applications in Empirical Language Processing by Joy☆11Updated 6 years ago
- A Java library capable of constructing character-sequence-storing, directed acyclic graphs of minimal size☆43Updated 11 years ago
- General purpose C++ library for iZENECloud☆42Updated 9 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Updated 7 years ago
- A C++ template library for compact Hamming distance indexes☆10Updated 7 years ago
- Implementation of many similarity join algorithms.☆15Updated 10 years ago
- Feed-forward Bloom filters☆52Updated 13 years ago
- A comparison between different integer set techniques☆14Updated 6 years ago