hybridtheory / floc-simhashLinks
A fast python implementation of the SimHash algorithm.
☆27Updated 4 years ago
Alternatives and similar repositories for floc-simhash
Users that are interested in floc-simhash are comparing it to the libraries listed below
Sorting:
- code and data used to build a training dataset for dragnet models☆10Updated 5 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆169Updated 3 years ago
- A machine learning tool for fishing entities☆265Updated 6 months ago
- ☆69Updated 3 years ago
- ☆30Updated 3 years ago
- Fuzzy matching and more functionality for spaCy.☆259Updated last year
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆142Updated last year
- Abydos NLP/IR library for Python☆192Updated 3 years ago
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- ☆70Updated 2 years ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆77Updated 3 weeks ago
- A Named-Entity Recogniser based on Grobid.☆54Updated 6 months ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆179Updated 5 months ago
- 📂 Additional lookup tables and data resources for spaCy☆113Updated 5 months ago
- A Flexible Deep Learning Approach to Fuzzy String Matching☆147Updated last year
- Detect and visualize text reuse☆119Updated last year
- DaCy: The State of the Art Danish NLP pipeline using SpaCy☆98Updated 11 months ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆292Updated 2 years ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆62Updated this week
- Library for unit extraction - fork of quantulum for python3☆145Updated last year
- Train a model, and detect gibberish strings with it.☆67Updated 3 years ago
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆220Updated 10 months ago
- Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr☆21Updated 3 years ago
- Sentence transformers models for SpaCy☆109Updated 2 years ago
- A spaCy wrapper for DBpedia Spotlight☆112Updated 2 years ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆69Updated 2 years ago
- Faster, modernized fork of the language identification tool langid.py☆61Updated last year
- ☆55Updated last year