hybridtheory / floc-simhash
A fast python implementation of the SimHash algorithm.
☆27Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for floc-simhash
- An index data structure for approximate string search.☆23Updated 5 years ago
- ☆29Updated 2 years ago
- Faster, modernized fork of the language identification tool langid.py☆48Updated this week
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆153Updated 2 years ago
- Hidden alignment conditional random field for classifying string pairs.☆25Updated 2 months ago
- Language detection using Spacy and Fasttext☆54Updated 11 months ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆65Updated 4 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 3 years ago
- ReconNER, Debug annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality of your data.☆34Updated 4 years ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated 8 months ago
- Boilerplate Removal using Deep Learning☆82Updated 2 years ago
- Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr☆19Updated 2 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆91Updated last year
- ☆70Updated last year
- A browser user interface for manual labeling of record pairs.☆41Updated last year
- spaCy match and replace, maintaining conjugation☆34Updated last year
- spaCy entry points for Curated Transformers☆25Updated last month
- ☆67Updated 2 years ago
- Annotation Management for Prodigy, that support multiple users working in many projects☆15Updated 6 years ago
- Python package for deduplication/entity resolution using active learning☆78Updated 2 months ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆60Updated this week
- Graph Engine for Exploration and Search☆40Updated 9 months ago
- Performance evaluation of nearest neighbor search using Vespa, Elasticsearch and Open Distro for Elasticsearch K-NN☆116Updated 3 years ago
- Text readability metrics in Python.☆12Updated 11 years ago
- Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18☆168Updated 3 years ago
- Sentence transformers models for SpaCy☆105Updated last year
- A spaCy wrapper for DBpedia Spotlight☆105Updated last year
- Training/test data for Dragnet☆41Updated 9 years ago