1e0ng / simhash
A Python Implementation of Simhash Algorithm
☆995Updated 2 years ago
Alternatives and similar repositories for simhash:
Users that are interested in simhash are comparing it to the libraries listed below
- Simhash and near-duplicate detection☆413Updated last year
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆967Updated 9 months ago
- An efficient simhash implementation for python☆125Updated 5 years ago
- Scalable Bloom Filter implemented in Python☆1,619Updated 3 years ago
- A python binding for crfsuite☆771Updated 3 months ago
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,635Updated 7 months ago
- Topic modeling with latent Dirichlet allocation using Gibbs sampling☆1,253Updated 5 months ago
- Pure python Aho-Corasick library.☆214Updated 2 years ago
- Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages☆542Updated 3 years ago
- A fast Python implementation of locality sensitive hashing.☆661Updated 4 years ago
- 用TF特征向量和simhash指纹计算中文文本的相似度☆212Updated 8 years ago
- Python extension module for accelerating regular expressions using libesm☆132Updated last year
- A simple short-text classification tool based on LibLinear☆676Updated 3 years ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆1,267Updated 3 years ago
- Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.☆1,050Updated 3 months ago
- A python implementation of the Rapid Automatic Keyword Extraction☆375Updated 6 years ago
- Four word embedding models implemented in Python. Supporting arbitrary context features☆847Updated 5 years ago
- sentence embedding by Smooth Inverse Frequency weighting scheme☆1,086Updated 5 years ago
- CRF++: Yet Another CRF toolkit☆506Updated 3 years ago
- An Efficient Lexical Analyzer for Chinese☆2,034Updated 2 years ago
- A Toolkit for Industrial Topic Modeling☆2,639Updated 3 years ago
- A python implementation of the Rapid Automatic Keyword Extraction☆975Updated 4 years ago
- Constants used in Chinese text processing☆365Updated last month
- Python library implementing a trie data structure.☆818Updated 3 years ago
- pyltp: the python extension for LTP☆1,538Updated 2 years ago
- A Python wrapper around the NLPIR/ICTCLAS Chinese segmentation software.☆573Updated last month
- Toy Python implementation of http://www-nlp.stanford.edu/projects/glove/☆1,256Updated 2 years ago
- This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representa…☆1,643Updated 3 years ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆385Updated 2 years ago
- word2vec/glove/swivel binary file on chinese corpus☆399Updated 8 years ago