hybridtheory / floc-simhash
A fast python implementation of the SimHash algorithm.
☆27Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for floc-simhash
- ☆29Updated 2 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆152Updated 2 years ago
- ☆66Updated 2 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆91Updated last year
- A machine learning tool for fishing entities☆245Updated 2 months ago
- Fuzzy matching and more functionality for spaCy.☆252Updated 4 months ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated 8 months ago
- Scalable String Similarity Joins in Python☆39Updated 4 months ago
- A Named-Entity Recogniser based on Grobid.☆49Updated 2 months ago
- Faster, modernized fork of the language identification tool langid.py☆48Updated 4 months ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- Multi-Langauge Identification☆28Updated 3 months ago
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆137Updated 3 months ago
- Language detection using Spacy and Fasttext☆54Updated 10 months ago
- ☆70Updated last year
- Information extraction from English and German texts based on predicate logic☆135Updated last year
- Abydos NLP/IR library for Python☆183Updated 2 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆50Updated 4 years ago
- Notebooks configured to be run with Binder, usually found on my blog.☆41Updated last year
- Hidden alignment conditional random field for classifying string pairs.☆25Updated last month
- 📂 Additional lookup tables and data resources for spaCy☆98Updated last year
- ☄️ Parallel and distributed training with spaCy and Ray☆54Updated last year
- Dataframe Integration with spaCy.☆101Updated 3 years ago
- Blazing fast topic modelling for short texts.☆31Updated last month
- Python 3 library for processing historical English☆64Updated 3 months ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆67Updated last week
- Anonymization of legal cases (Fr) based on Flair embeddings☆87Updated 3 years ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆60Updated this week
- A fuzzy matching & clustering library for python.☆26Updated last year