hybridtheory / floc-simhashLinks
A fast python implementation of the SimHash algorithm.
☆27Updated 3 years ago
Alternatives and similar repositories for floc-simhash
Users that are interested in floc-simhash are comparing it to the libraries listed below
Sorting:
- ☆30Updated 3 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Abydos NLP/IR library for Python☆190Updated 2 years ago
- ☆69Updated 3 years ago
- Language detection using Spacy and Fasttext☆57Updated last year
- Multi-Langauge Identification☆28Updated last year
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆141Updated last year
- Text tokenization and sentence segmentation (segtok v2)☆206Updated 3 years ago
- 📂 Additional lookup tables and data resources for spaCy☆107Updated 3 months ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated last month
- A machine learning tool for fishing entities☆266Updated 4 months ago
- Performance evaluation of nearest neighbor search using Vespa, Elasticsearch and Open Distro for Elasticsearch K-NN☆117Updated 4 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆291Updated 2 years ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆154Updated 2 years ago
- Text Mining and Topic Modeling Toolkit for Python with parallel processing power☆190Updated 2 years ago
- Fuzzy matching and more functionality for spaCy.☆258Updated last year
- ☆70Updated 2 years ago
- Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr☆21Updated 3 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆164Updated 2 years ago
- Find strings/words in text; convenience and C speed☆127Updated 3 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆175Updated 3 months ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago
- Blazing fast topic modelling for short texts.☆33Updated 2 months ago
- Information extraction from English and German texts based on predicate logic☆138Updated 2 years ago
- Library for unit extraction - fork of quantulum for python3☆142Updated last year
- A Flexible Deep Learning Approach to Fuzzy String Matching☆147Updated 11 months ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated 2 years ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated last year
- 🌸 Train floret vectors☆18Updated 2 years ago