diffeo / py-nilsimsa
Locality-sensitive hashing algorithm for text similarity comparisons
β58Updated 3 years ago
Alternatives and similar repositories for py-nilsimsa:
Users that are interested in py-nilsimsa are comparing it to the libraries listed below
- A fast Python implementation of locality sensitive hashing.β70Updated 10 years ago
- Roaring Bitmap in Cythonβ81Updated 10 months ago
- π₯ Cython hash tables that assume keys are pre-hashedβ86Updated 2 months ago
- HAT-Trie for Pythonβ86Updated 9 years ago
- Python search module for fast approximate string matchingβ54Updated 2 years ago
- An efficient simhash implementation for pythonβ124Updated 5 years ago
- Probabilistic Data Structures in Python (originally presented at PyData 2013)β55Updated 3 years ago
- Python wrapper for RE2β296Updated last year
- Python to Gremlin Graph Abstraction Layerβ54Updated 7 years ago
- implementations of a counting bloom, a timing bloom and a scaling timing bloom... made for working with streams!β42Updated 8 years ago
- Sometimes you just need a lot of text. Plainstream is a small Python app that provides you with a plain text stream directly from Wikipedβ¦β24Updated last year
- Python bindings to the Compact Language Detectorβ33Updated 4 years ago
- extract difference between two html pagesβ32Updated 6 years ago
- Updates to Zope's keyphrase extractor (forked from 1.1.0)β67Updated 7 years ago
- It is a forest of random projection treesβ224Updated 5 years ago
- Simple approximate-nearest-neighbours in Python using locality sensitive hashing.β140Updated 12 years ago
- a pure python MurmurHash3 implementation.β68Updated 5 years ago
- Algorithms for "schema matching"β26Updated 8 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fiβ¦β48Updated 3 years ago
- An Exploration into Graph Databasesβ28Updated 9 years ago
- A component that tries to avoid downloading duplicate contentβ27Updated 6 years ago
- An index data structure for approximate string search.β23Updated 5 years ago
- A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']β82Updated 8 years ago
- Semanticizest: dump parser and clientβ20Updated 8 years ago
- A Cython implementation of the affine gap string distanceβ57Updated 2 years ago
- Simple spill-to-disk dictionaryβ60Updated 3 years ago
- Traptor -- A distributed Twitter feedβ26Updated 2 years ago
- NaΓ―ve Bayesian Text Classifier on Redisβ116Updated 5 years ago
- Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neigβ¦β99Updated 9 years ago
- β38Updated 9 years ago