ekzhu / datasketchLinks
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
☆2,744Updated last year
Alternatives and similar repositories for datasketch
Users that are interested in datasketch are comparing it to the libraries listed below
Sorting:
- Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-me…☆3,521Updated 10 months ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,153Updated last year
- Benchmarks of approximate nearest neighbor libraries in Python☆5,396Updated last month
- A fast Python implementation of locality sensitive hashing.☆667Updated 5 years ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆1,015Updated last month
- A Python Implementation of Simhash Algorithm☆1,023Updated 3 years ago
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆13,892Updated last year
- Header-only C++/python library for fast approximate nearest neighbors☆4,830Updated last month
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆919Updated 4 years ago
- A Python nearest neighbor descent for approximate nearest neighbors☆938Updated 8 months ago
- ☆3,169Updated 3 years ago
- A system for quickly generating training data with weak supervision☆5,900Updated last year
- Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.☆1,098Updated this week
- Example Python code for comparing documents using MinHash☆252Updated 6 years ago
- Learning embeddings for classification, retrieval and ranking.☆3,955Updated 2 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆290Updated 2 years ago
- A Collection of BM25 Algorithms in Python☆1,217Updated 10 months ago
- A library implementing different string similarity and distance measures using Python.☆1,016Updated 2 years ago
- Learning to Rank in TensorFlow☆2,780Updated last year
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆1,276Updated 3 years ago
- Some useful tips for faiss☆620Updated last year
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,483Updated 3 months ago
- Heuristic based boilerplate removal tool☆788Updated 5 months ago
- NLP, before and after spaCy☆2,231Updated last year
- Navigating Spreading-out Graph For Approximate Nearest Neighbor Search☆685Updated last year
- Deep recommender models using PyTorch.☆3,025Updated 2 years ago
- A library for k-nearest neighbor search☆384Updated last year
- Simhash and near-duplicate detection☆417Updated 2 years ago
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.☆1,903Updated this week
- A python tool for evaluating the quality of sentence embeddings.☆2,107Updated last year