ekzhu / datasketchLinks
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
☆2,825Updated this week
Alternatives and similar repositories for datasketch
Users that are interested in datasketch are comparing it to the libraries listed below
Sorting:
- Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-me…☆3,549Updated this week
- A fast Python implementation of locality sensitive hashing.☆671Updated 5 years ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,156Updated last year
- Benchmarks of approximate nearest neighbor libraries in Python☆5,526Updated 5 months ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆1,048Updated last week
- Header-only C++/python library for fast approximate nearest neighbors☆4,996Updated 2 months ago
- Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data☆1,339Updated last month
- Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.☆1,115Updated 2 weeks ago
- Example Python code for comparing documents using MinHash☆251Updated 6 years ago
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆920Updated 5 years ago
- Learning embeddings for classification, retrieval and ranking.☆3,957Updated 3 years ago
- Python library for interactive topic model visualization. Port of the R LDAvis package.☆1,846Updated last year
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,508Updated 7 months ago
- Some useful tips for faiss☆628Updated 3 months ago
- A Python nearest neighbor descent for approximate nearest neighbors☆952Updated last month
- Heuristic based boilerplate removal tool☆806Updated 9 months ago
- A Collection of BM25 Algorithms in Python☆1,270Updated last year
- A library for k-nearest neighbor search☆385Updated last year
- Learning to Rank in TensorFlow☆2,782Updated last year
- A library implementing different string similarity and distance measures using Python.☆1,021Updated 3 years ago
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.☆1,988Updated last week
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.☆2,010Updated last month
- Computing with Python functions.☆4,280Updated this week
- A high performance implementation of HDBSCAN clustering.☆3,026Updated 3 weeks ago
- All-pair set similarity search on millions of sets in Python and on a laptop☆604Updated 3 years ago
- Deep recommender models using PyTorch.☆3,040Updated 2 years ago
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,172Updated last week
- Utils for streaming large files (S3, HDFS, gzip, bz2...)☆3,413Updated 3 weeks ago
- ☆1,253Updated last year
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆14,073Updated last month