diffeo / py-nilsimsa
Locality-sensitive hashing algorithm for text similarity comparisons
☆59Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for py-nilsimsa
- 💥 Cython hash tables that assume keys are pre-hashed☆82Updated this week
- An efficient simhash implementation for python☆125Updated 5 years ago
- A Cython implementation of the affine gap string distance☆58Updated last year
- Python search module for fast approximate string matching☆53Updated last year
- Roaring Bitmap in Cython☆79Updated 6 months ago
- HAT-Trie for Python☆87Updated 8 years ago
- Python wrapper for RE2☆99Updated 2 months ago
- stop word lists in several languages☆21Updated 7 years ago
- Python bindings to the Compact Language Detector☆33Updated 4 years ago
- implementations of a counting bloom, a timing bloom and a scaling timing bloom... made for working with streams!☆42Updated 7 years ago
- A Tool for Embedding Strings in Vector Spaces☆58Updated 5 years ago
- A Python implementation of the Metaphone and Double Metaphone algorithms☆80Updated 8 months ago
- A fast Python implementation of locality sensitive hashing.☆70Updated 9 years ago
- Updates to Zope's keyphrase extractor (forked from 1.1.0)☆67Updated 7 years ago
- Knowledge extraction from web data☆92Updated 6 years ago
- Python bindings for the Google's FarmHash☆38Updated 2 months ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆144Updated 2 months ago
- Python wrapper for RE2☆295Updated last year
- Algorithms for "schema matching"☆25Updated 8 years ago
- An efficient, immutable, persistent mapping object☆99Updated 5 years ago
- Levenshtein and Hamming distance computation☆117Updated 5 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆50Updated 9 years ago
- Interesting (non-cryptographic) hashes implemented in pure Python.☆240Updated 3 years ago
- An efficient approximation for tree edit-distance.☆46Updated 13 years ago
- This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet…☆29Updated 2 months ago
- DAFSA-based dictionary-like read-only objects for Python. Based on `dawgdic` C++ library.☆300Updated 5 months ago
- URL normalization for Python☆94Updated 2 years ago
- Language detection extension for spaCy 2.0+☆111Updated 5 years ago