hybridtheory / floc-simhashLinks
A fast python implementation of the SimHash algorithm.
☆27Updated 3 years ago
Alternatives and similar repositories for floc-simhash
Users that are interested in floc-simhash are comparing it to the libraries listed below
Sorting:
- ☆69Updated 3 years ago
- ☆30Updated 2 years ago
- An index data structure for approximate string search.☆23Updated 6 years ago
- Algorithms for "schema matching"☆26Updated 8 years ago
- A Flexible Deep Learning Approach to Fuzzy String Matching☆145Updated 7 months ago
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆138Updated 10 months ago
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆66Updated 2 months ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆162Updated 2 years ago
- Blazing fast topic modelling for short texts.☆32Updated last month
- Scalable String Similarity Joins in Python☆39Updated 10 months ago
- A Named-Entity Recogniser based on Grobid.☆53Updated 3 weeks ago
- Multi-Langauge Identification☆28Updated 10 months ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- An efficient simhash implementation for python☆125Updated 5 years ago
- A browser user interface for manual labeling of record pairs.☆47Updated last year
- Code and models for our CLEF-HIPE (Named Entity Processing on Historical Newspapers) submissions☆19Updated 2 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 3 years ago
- Python search module for fast approximate string matching☆54Updated 2 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- The CleanCoNLL dataset from our EMNLP 2023 paper where we corrected annotation errors and inconsistencies in CoNLL-03.☆24Updated 11 months ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 3 weeks ago
- Boolean text search in Python☆45Updated 2 years ago
- Fuzzy matching and more functionality for spaCy.☆256Updated 11 months ago
- Targetted language identifier, based on FastText and Hunspell.☆34Updated 3 months ago
- ☆70Updated 2 years ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆28Updated last year
- Boilerplate Removal using Deep Learning☆82Updated 3 years ago
- Python package for deduplication/entity resolution using active learning☆80Updated 9 months ago
- PDF parser powered by grobid☆27Updated 10 months ago