ekzhu / datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
☆2,675Updated 10 months ago
Alternatives and similar repositories for datasketch:
Users that are interested in datasketch are comparing it to the libraries listed below
- Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-me…☆3,474Updated 7 months ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,150Updated 10 months ago
- A Python Implementation of Simhash Algorithm☆1,008Updated 3 years ago
- NLP, before and after spaCy☆2,224Updated last year
- A fast Python implementation of locality sensitive hashing.☆664Updated 4 years ago
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆918Updated 4 years ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆991Updated last year
- Benchmarks of approximate nearest neighbor libraries in Python☆5,233Updated this week
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆284Updated last year
- A library implementing different string similarity and distance measures using Python.☆1,003Updated 2 years ago
- Python library for interactive topic model visualization. Port of the R LDAvis package.☆1,826Updated 9 months ago
- ☆3,159Updated 3 years ago
- Header-only C++/python library for fast approximate nearest neighbors☆4,637Updated 8 months ago
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆767Updated 2 years ago
- Multilingual text (NLP) processing toolkit☆2,332Updated last year
- All-pair set similarity search on millions of sets in Python and on a laptop☆592Updated 2 years ago
- Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.☆1,069Updated 3 weeks ago
- Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data☆1,301Updated this week
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆1,275Updated 3 years ago
- A fast, efficient universal vector embedding utility package.☆1,645Updated last year
- Topic Modelling for Humans☆15,968Updated 2 months ago
- 🔮 A refreshing functional take on deep learning, compatible with your favorite libraries☆2,842Updated 2 weeks ago
- A python tool for evaluating the quality of sentence embeddings.☆2,105Updated last year
- Scalable Bloom Filter implemented in Python☆1,623Updated 3 years ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,175Updated 9 months ago
- 🦆 Contextually-keyed word vectors☆1,646Updated last year
- Deep recommender models using PyTorch.☆3,013Updated 2 years ago
- Heuristic based boilerplate removal tool☆765Updated last month
- Port of Google's language-detection library to Python.☆1,787Updated last month
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆13,677Updated 8 months ago