ekzhu / datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
☆2,661Updated 9 months ago
Alternatives and similar repositories for datasketch:
Users that are interested in datasketch are comparing it to the libraries listed below
- Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-me…☆3,460Updated 6 months ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,148Updated 9 months ago
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆767Updated 2 years ago
- A fast Python implementation of locality sensitive hashing.☆661Updated 4 years ago
- Benchmarks of approximate nearest neighbor libraries in Python☆5,177Updated last week
- A Python Implementation of Simhash Algorithm☆1,005Updated 3 years ago
- ☆3,155Updated 3 years ago
- A fast, efficient universal vector embedding utility package.☆1,644Updated last year
- A python tool for evaluating the quality of sentence embeddings.☆2,101Updated last year
- A system for quickly generating training data with weak supervision☆5,842Updated 10 months ago
- Learning embeddings for classification, retrieval and ranking.☆3,950Updated 2 years ago
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆918Updated 4 years ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆983Updated last year
- Header-only C++/python library for fast approximate nearest neighbors☆4,606Updated 7 months ago
- All-pair set similarity search on millions of sets in Python and on a laptop☆592Updated 2 years ago
- InferSent sentence embeddings☆2,284Updated 3 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆285Updated last year
- ☆1,196Updated 7 months ago
- sentence embedding by Smooth Inverse Frequency weighting scheme☆1,085Updated 5 years ago
- A tool for extracting plain text from Wikipedia dumps☆3,832Updated 10 months ago
- General purpose unsupervised sentence representations☆1,202Updated 2 years ago
- Scalable Bloom Filter implemented in Python☆1,622Updated 3 years ago
- Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data☆1,292Updated last month
- Topic modeling with latent Dirichlet allocation using Gibbs sampling☆1,272Updated 7 months ago
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,457Updated 6 months ago
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆13,621Updated 7 months ago
- Multilingual text (NLP) processing toolkit☆2,330Updated last year
- A large annotated semantic parsing corpus for developing natural language interfaces.☆1,708Updated last year
- Python library for interactive topic model visualization. Port of the R LDAvis package.☆1,823Updated 8 months ago
- Entity Linker solution☆1,185Updated last year