hybridtheory / floc-simhash
A fast python implementation of the SimHash algorithm.
☆27Updated 3 years ago
Alternatives and similar repositories for floc-simhash:
Users that are interested in floc-simhash are comparing it to the libraries listed below
- ☆69Updated 3 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆158Updated 2 years ago
- ☆30Updated 2 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- Scalable String Similarity Joins in Python☆39Updated 8 months ago
- An efficient simhash implementation for python☆124Updated 5 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Abydos NLP/IR library for Python☆185Updated 2 years ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 6 months ago
- ☆70Updated 2 years ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- Blazing fast topic modelling for short texts.☆31Updated 2 months ago
- Trying to generate name synonyms from wikidata☆32Updated 4 years ago
- Python package for deduplication/entity resolution using active learning☆77Updated 7 months ago
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- spaCy entry points for Curated Transformers☆27Updated 6 months ago
- Fuzzy matching and more functionality for spaCy.☆256Updated 8 months ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆58Updated 3 years ago
- Language detection using Spacy and Fasttext☆55Updated last year
- Information extraction from English and German texts based on predicate logic☆135Updated last year
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated last year
- Multi-Langauge Identification☆29Updated 8 months ago
- A machine learning tool for fishing entities☆263Updated this week
- Extract networks of entities from journalistic reporting☆48Updated last year
- Algorithms for "schema matching"☆26Updated 8 years ago
- Anonymization of legal cases (Fr) based on Flair embeddings☆88Updated 4 years ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆62Updated last week
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆154Updated 4 months ago
- A spaCy wrapper for DBpedia Spotlight☆109Updated 2 years ago
- ☆54Updated last year