1e0ng / simhashLinks
A Python Implementation of Simhash Algorithm
☆1,019Updated 3 years ago
Alternatives and similar repositories for simhash
Users that are interested in simhash are comparing it to the libraries listed below
Sorting:
- Simhash and near-duplicate detection☆416Updated 2 years ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆1,007Updated 2 weeks ago
- An efficient simhash implementation for python☆125Updated 5 years ago
- Scalable Bloom Filter implemented in Python☆1,621Updated 4 years ago
- A simple short-text classification tool based on LibLinear☆682Updated 4 years ago
- Pure python Aho-Corasick library.☆216Updated 2 years ago
- 用TF特征向量和simhash指纹计算中文文本的相似度☆216Updated 8 years ago
- Python extension module for accelerating regular expressions using libesm☆131Updated last year
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,730Updated last year
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆289Updated 2 years ago
- CRF++: Yet Another CRF toolkit☆508Updated 4 months ago
- A fast Python implementation of locality sensitive hashing.☆664Updated 5 years ago
- Fast, efficiently stored Trie for Python. Uses libdatrie.☆537Updated last year
- A python implementation of the Rapid Automatic Keyword Extraction☆373Updated 7 years ago
- A library implementing different string similarity and distance measures using Python.☆1,014Updated 2 years ago
- a chinese segment base on crf☆233Updated 6 years ago
- scikit-learn inspired API for CRFsuite☆431Updated last year
- Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.☆342Updated 3 months ago
- A python binding for crfsuite☆774Updated 9 months ago
- Fast Redis Bloom Filters in Python☆290Updated 6 years ago
- Python library implementing a trie data structure.☆822Updated 4 years ago
- An extremely simple Python library to perform TF-IDF document comparison.☆243Updated 4 years ago
- Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.☆1,092Updated last week
- Constants used in Chinese text processing☆373Updated 6 months ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,152Updated last year
- 基于哈工大同义词词林扩展版的单词相似度计算方法☆367Updated 2 years ago
- Four word embedding models implemented in Python. Supporting arbitrary context features☆851Updated 5 years ago
- An Efficient Lexical Analyzer for Chinese☆810Updated 2 years ago
- Automatically exported from code.google.com/p/chromium-compact-language-detector☆162Updated 4 years ago
- CRFsuite: a fast implementation of Conditional Random Fields (CRFs)☆659Updated last year