mattilyra / LSH
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
☆284Updated last year
Alternatives and similar repositories for LSH:
Users that are interested in LSH are comparing it to the libraries listed below
- Calculates Word Mover's Distance Insanely Fast☆460Updated last year
- Example Python code for comparing documents using MinHash☆250Updated 6 years ago
- A fast Python implementation of locality sensitive hashing.☆662Updated 4 years ago
- A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.☆146Updated 5 months ago
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆768Updated last year
- Yet another Python binding for fastText☆226Updated 6 years ago
- Fast, DB Backed pretrained word embeddings for natural language processing.☆222Updated last year
- MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW☆2,648Updated 8 months ago
- Named Entity Recognition based on dictionaries☆242Updated 5 years ago
- A Python implementation of the BM25 ranking function.☆234Updated 5 years ago
- FastXML / PFastXML / PFastreXML - Implementation of Extreme Multi-label Classification☆147Updated 8 months ago
- Concatenated Power Mean Embeddings as Universal Cross-Lingual Sentence Representations☆185Updated 4 years ago
- A pure python implementation of locality sensitive hashing for text documents☆86Updated 9 years ago
- Various Algorithms for Short Text Mining☆466Updated this week
- Palmetto is a quality measuring tool for topics☆217Updated last year
- Flexible classic and NeurAl Retrieval Toolkit☆215Updated last week
- Compute Sentence Embeddings Fast!☆619Updated last year
- Simhash and near-duplicate detection☆413Updated last year
- Quickly extract multi-word phrases from a corpus☆190Updated 4 years ago
- Snorkel MeTaL: A framework for training models with multi-task weak supervision☆424Updated 5 years ago
- Counter-fitting Word Vectors to Linguistic Constraints☆144Updated 4 years ago
- ☆188Updated 8 months ago
- EmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings (official implementation)☆434Updated last year
- PYthon Automated Term Extraction☆309Updated 2 years ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated last year
- Word Embeddings for Information Retrieval☆225Updated last year
- Text tokenization and sentence segmentation (segtok v2)☆201Updated 2 years ago
- Learning Named Entity Tagger from Domain-Specific Dictionary☆482Updated 5 years ago
- Textpipe: clean and extract metadata from text☆302Updated 3 years ago
- LexRank algorithm for text summarization☆230Updated 10 months ago