mattilyra / LSHLinks
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
☆288Updated 2 years ago
Alternatives and similar repositories for LSH
Users that are interested in LSH are comparing it to the libraries listed below
Sorting:
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆147Updated 9 months ago
- Example Python code for comparing documents using MinHash☆251Updated 6 years ago
- Simhash and near-duplicate detection☆416Updated 2 years ago
- Calculates Word Mover's Distance Insanely Fast☆461Updated last year
- Compute Sentence Embeddings Fast!☆623Updated 2 years ago
- A fast implementation of GloVe, with optional retrofitting☆244Updated 2 years ago
- Semantic Text Similarity Dataset Hub☆717Updated 7 years ago
- A Python implementation of the BM25 ranking function.☆235Updated 5 years ago
- Fast, DB Backed pretrained word embeddings for natural language processing.☆222Updated 2 months ago
- Flexible classic and NeurAl Retrieval Toolkit☆217Updated 4 months ago
- Word Embeddings for Information Retrieval☆225Updated last year
- Counter-fitting Word Vectors to Linguistic Constraints☆144Updated 5 years ago
- Document ranking via sentence modeling using BERT☆144Updated 2 years ago
- Python library for Natural Language Preprocessing (NLPre)☆191Updated last year
- Making sense embedding out of word embeddings using graph-based word sense induction☆213Updated 4 years ago
- Named Entity Recognition based on dictionaries☆242Updated 6 years ago
- Dynamic Meta-Embeddings for Improved Sentence Representations☆332Updated 4 years ago
- Language independent truecaser in Python.☆160Updated 3 years ago
- Palmetto is a quality measuring tool for topics☆216Updated last year
- scikit-learn inspired API for CRFsuite☆430Updated last year
- ☆214Updated 6 years ago
- Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)☆157Updated 6 years ago
- Doc2VecC from the paper "Efficient Vector Representation for Documents through Corruption"☆186Updated 8 years ago
- Twitter named entity extraction for WNUT 2016 http://noisy-text.github.io/2016/ner-shared-task.html☆139Updated 2 years ago
- General purpose unsupervised sentence representations☆1,204Updated 2 years ago
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- Language detection extension for spaCy 2.0+☆113Updated 6 years ago
- Sentence2vec by Rock☆311Updated 2 months ago
- Various Algorithms for Short Text Mining☆471Updated 2 weeks ago
- Quickly extract multi-word phrases from a corpus☆191Updated 4 years ago