hybridtheory / floc-simhashLinks
A fast python implementation of the SimHash algorithm.
☆27Updated 3 years ago
Alternatives and similar repositories for floc-simhash
Users that are interested in floc-simhash are comparing it to the libraries listed below
Sorting:
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- ☆30Updated 3 years ago
- ☆69Updated 3 years ago
- In this project, we need to find out commercial products listed on Google that refer to the same entity across Amazon by comparing the si…☆11Updated 8 years ago
- Python package for deduplication/entity resolution using active learning☆80Updated 10 months ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆162Updated 2 years ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated last week
- A set of workflows for corpus building through OCR, post-correction and normalisation☆49Updated 2 years ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated last year
- Abydos NLP/IR library for Python☆186Updated 2 years ago
- Algorithms for "schema matching"☆26Updated 8 years ago
- Blazing fast topic modelling for short texts.☆32Updated 2 months ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 5 years ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- Web content extraction using machine learning☆33Updated 4 years ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago
- Record Linkage ToolKit (Find and link entities)☆110Updated last year
- Scalable String Similarity Joins in Python☆39Updated 11 months ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆58Updated 2 months ago
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated last year
- A Named-Entity Recogniser based on Grobid.☆53Updated last month
- Annotation Management for Prodigy, that support multiple users working in many projects☆15Updated 6 years ago
- A Flexible Deep Learning Approach to Fuzzy String Matching☆144Updated 8 months ago
- Sentiment Corpus for Swedish 🇸🇪 Norwegian 🇳🇴 Danish 🇩🇰 Finnish 🇫🇮 (and English 🏴)☆15Updated 4 years ago
- Finds linguistic patterns effortlessly☆36Updated last year
- BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 3 years ago
- 🌸 Train floret vectors☆18Updated 2 years ago
- Language detection using Spacy and Fasttext☆55Updated last year