ekzhu / datasketchLinks
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
☆2,736Updated last year
Alternatives and similar repositories for datasketch
Users that are interested in datasketch are comparing it to the libraries listed below
Sorting:
- Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-me…☆3,515Updated 9 months ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,153Updated last year
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆767Updated 2 years ago
- A fast Python implementation of locality sensitive hashing.☆664Updated 5 years ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆1,008Updated last month
- Benchmarks of approximate nearest neighbor libraries in Python☆5,350Updated last month
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆919Updated 4 years ago
- Header-only C++/python library for fast approximate nearest neighbors☆4,793Updated 3 weeks ago
- A high performance implementation of HDBSCAN clustering.☆2,959Updated 2 months ago
- A Collection of BM25 Algorithms in Python☆1,208Updated 9 months ago
- Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data☆1,314Updated 3 weeks ago
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,477Updated 3 months ago
- NLP, before and after spaCy☆2,228Updated last year
- Python library for interactive topic model visualization. Port of the R LDAvis package.☆1,834Updated last year
- A fast, efficient universal vector embedding utility package.☆1,649Updated last year
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,185Updated last month
- Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.☆1,092Updated 2 weeks ago
- A library for debugging/inspecting machine learning classifiers and explaining their predictions☆2,772Updated 2 months ago
- A python tool for evaluating the quality of sentence embeddings.☆2,107Updated last year
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.☆1,889Updated last week
- Multilingual text (NLP) processing toolkit☆2,346Updated last year
- Learning to Rank in TensorFlow☆2,778Updated last year
- ☆1,228Updated 11 months ago
- Python Keyphrase Extraction module☆1,580Updated 2 years ago
- A library implementing different string similarity and distance measures using Python.☆1,014Updated 2 years ago
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,144Updated last month
- A library of sklearn compatible categorical variable encoders☆2,450Updated 3 weeks ago
- Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet f…☆1,847Updated last year
- A system for quickly generating training data with weak supervision☆5,882Updated last year
- A Python nearest neighbor descent for approximate nearest neighbors☆931Updated 8 months ago