ekzhu / datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
☆2,648Updated 8 months ago
Alternatives and similar repositories for datasketch:
Users that are interested in datasketch are comparing it to the libraries listed below
- A fast Python implementation of locality sensitive hashing.☆662Updated 4 years ago
- Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-me…☆3,445Updated 5 months ago
- FAst Lookups of Cosine and Other Nearest Neighbors (based on fast locality-sensitive hashing)☆1,148Updated 8 months ago
- A Python Implementation of Simhash Algorithm☆1,004Updated 2 years ago
- Learning embeddings for classification, retrieval and ranking.☆3,951Updated 2 years ago
- ☆3,156Updated 3 years ago
- Benchmarks of approximate nearest neighbor libraries in Python☆5,107Updated 3 weeks ago
- Approximate Nearest Neighbor Search for Sparse Data in Python!☆919Updated 4 years ago
- Example Python code for comparing documents using MinHash☆250Updated 6 years ago
- Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive…☆768Updated last year
- All-pair set similarity search on millions of sets in Python and on a laptop☆593Updated 2 years ago
- Python module (C extension and plain python) implementing Aho-Corasick algorithm☆976Updated 11 months ago
- A fast, efficient universal vector embedding utility package.☆1,641Updated last year
- A python tool for evaluating the quality of sentence embeddings.☆2,094Updated 11 months ago
- Learning to Rank in TensorFlow☆2,762Updated 11 months ago
- A library for Multilingual Unsupervised or Supervised word Embeddings☆3,204Updated 2 years ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,166Updated 7 months ago
- Header-only C++/python library for fast approximate nearest neighbors☆4,551Updated 6 months ago
- A system for quickly generating training data with weak supervision☆5,831Updated 9 months ago
- Facilitating the design, comparison and sharing of deep text matching models.☆3,850Updated 6 months ago
- A python binding for crfsuite☆772Updated 4 months ago
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk☆13,504Updated 6 months ago
- Generate embeddings from large-scale graph-structured data.☆3,397Updated 11 months ago
- NLP, before and after spaCy☆2,215Updated last year
- A natural language modeling framework based on PyTorch☆6,327Updated 2 years ago
- Language-Agnostic SEntence Representations☆3,617Updated 9 months ago
- XLNet: Generalized Autoregressive Pretraining for Language Understanding☆6,182Updated last year
- Simhash and near-duplicate detection☆413Updated last year
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,097Updated last month
- InferSent sentence embeddings☆2,285Updated 3 years ago