ekzhu / datasketchLinks
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
☆2,810Updated this week
Alternatives and similar repositories for datasketch
Users that are interested in datasketch are comparing it to the libraries listed below
Sorting:
- A fast Python implementation of locality sensitive hashing.☆670Updated 5 years ago
- Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-me…☆3,547Updated 3 weeks ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆1,044Updated 2 weeks ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,156Updated last year
- A Python Implementation of Simhash Algorithm☆1,028Updated 3 years ago
- Benchmarks of approximate nearest neighbor libraries in Python☆5,495Updated 5 months ago
- Example Python code for comparing documents using MinHash☆251Updated 6 years ago
- A high performance implementation of HDBSCAN clustering.☆3,019Updated this week
- Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.☆1,114Updated 3 weeks ago
- A library implementing different string similarity and distance measures using Python.☆1,022Updated 3 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆292Updated 2 years ago
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆920Updated 5 years ago
- A Collection of BM25 Algorithms in Python☆1,265Updated last year
- A Python nearest neighbor descent for approximate nearest neighbors☆951Updated 3 weeks ago
- Header-only C++/python library for fast approximate nearest neighbors☆4,981Updated 2 months ago
- Learning embeddings for classification, retrieval and ranking.☆3,958Updated 2 years ago
- Anserini is a Lucene toolkit for reproducible information retrieval research☆1,085Updated this week
- Automatically create Faiss knn indices with the most optimal similarity search parameters.☆877Updated last week
- Simhash and near-duplicate detection☆420Updated 2 years ago
- Some useful tips for faiss☆627Updated 2 months ago
- Fast Python Collaborative Filtering for Implicit Feedback Datasets☆3,729Updated last year
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,166Updated 3 weeks ago
- A fast, efficient universal vector embedding utility package.☆1,651Updated 2 years ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆1,275Updated 4 years ago
- Learning to Rank in TensorFlow☆2,782Updated last year
- A system for quickly generating training data with weak supervision☆5,921Updated last year
- Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet f…☆1,867Updated 2 weeks ago
- A library for debugging/inspecting machine learning classifiers and explaining their predictions☆2,772Updated 6 months ago
- All-pair set similarity search on millions of sets in Python and on a laptop☆603Updated 3 years ago
- A python tool for evaluating the quality of sentence embeddings.☆2,106Updated last year