ekzhu / datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
☆2,583Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for datasketch
- A fast Python implementation of locality sensitive hashing.☆661Updated 4 years ago
- Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-me…☆3,417Updated 2 months ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,140Updated 5 months ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆951Updated 8 months ago
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆766Updated last year
- Benchmarks of approximate nearest neighbor libraries in Python☆4,974Updated 3 weeks ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆282Updated last year
- A Python Implementation of Simhash Algorithm☆982Updated 2 years ago
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆916Updated 4 years ago
- A system for quickly generating training data with weak supervision☆5,812Updated 6 months ago
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆13,264Updated 3 months ago
- Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data☆1,258Updated last week
- Generate embeddings from large-scale graph-structured data.☆3,387Updated 8 months ago
- DeepDive☆1,958Updated 2 years ago
- Header-only C++/python library for fast approximate nearest neighbors☆4,389Updated 3 months ago
- All-pair set similarity search on millions of sets in Python and on a laptop☆590Updated 2 years ago
- A Python nearest neighbor descent for approximate nearest neighbors☆896Updated last week
- Learning embeddings for classification, retrieval and ranking.☆3,947Updated last year
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,395Updated 2 months ago
- A python binding for crfsuite☆771Updated last month
- A fast, efficient universal vector embedding utility package.☆1,627Updated last year
- ☆3,149Updated 3 years ago
- A python tool for evaluating the quality of sentence embeddings.☆2,087Updated 8 months ago
- Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.☆1,047Updated last month
- Example Python code for comparing documents using MinHash☆250Updated 5 years ago
- Navigating Spreading-out Graph For Approximate Nearest Neighbor Search☆638Updated 11 months ago
- Python library implementing a trie data structure.☆816Updated 3 years ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,150Updated 4 months ago
- 🔮 A refreshing functional take on deep learning, compatible with your favorite libraries☆2,820Updated last month
- Scalable Bloom Filter implemented in Python☆1,619Updated 3 years ago