mattilyra / LSH
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
☆286Updated last year
Alternatives and similar repositories for LSH:
Users that are interested in LSH are comparing it to the libraries listed below
- Example Python code for comparing documents using MinHash☆251Updated 6 years ago
- Simhash and near-duplicate detection☆415Updated last year
- Calculates Word Mover's Distance Insanely Fast☆462Updated last year
- A pure python implementation of locality sensitive hashing for text documents☆85Updated 9 years ago
- Intelligently expand and create contractions in text leveraging grammar checking and Word Mover's Distance.☆77Updated 3 years ago
- Palmetto is a quality measuring tool for topics☆216Updated last year
- A fast Python implementation of locality sensitive hashing.☆664Updated 5 years ago
- LASER multilingual sentence embeddings as a pip package☆223Updated last year
- Fast, DB Backed pretrained word embeddings for natural language processing.☆222Updated last month
- Quickly extract multi-word phrases from a corpus☆191Updated 4 years ago
- Yet another Python binding for fastText☆225Updated 6 years ago
- scikit-learn inspired API for CRFsuite☆430Updated last year
- Counter-fitting Word Vectors to Linguistic Constraints☆144Updated 5 years ago
- LexRank algorithm for text summarization☆230Updated last year
- Concatenated Power Mean Embeddings as Universal Cross-Lingual Sentence Representations☆185Updated 4 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆169Updated 3 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆146Updated 8 months ago
- Sentence2vec by Rock☆312Updated last month
- ☆212Updated 6 years ago
- A Python implementation of the BM25 ranking function.☆234Updated 5 years ago
- Making sense embedding out of word embeddings using graph-based word sense induction☆213Updated 3 years ago
- Implementation of unsupervised smoothed inverse frequency (Best Paper, Repl4NLP @ ACL 2018)☆77Updated 6 years ago
- Word Mover's Distance from Matthew J Kusner's paper "From Word Embeddings to Document Distances"☆538Updated 11 months ago
- Character-based word embeddings model based on RNN for handling real world texts☆174Updated last year
- Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)☆158Updated 5 years ago
- Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html☆138Updated 2 years ago
- Flexible classic and NeurAl Retrieval Toolkit☆217Updated 3 months ago
- Word Embeddings for Information Retrieval☆225Updated last year
- Language independent truecaser in Python.☆160Updated 3 years ago
- Tool for interactive embeddings visualization☆310Updated 8 months ago