hybridtheory / floc-simhashLinks
A fast python implementation of the SimHash algorithm.
☆27Updated 3 years ago
Alternatives and similar repositories for floc-simhash
Users that are interested in floc-simhash are comparing it to the libraries listed below
Sorting:
- Fuzzy matching and more functionality for spaCy.☆257Updated last year
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆164Updated 2 years ago
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆141Updated last year
- ☆69Updated 3 years ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆154Updated 2 years ago
- Abydos NLP/IR library for Python☆188Updated 2 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆291Updated 2 years ago
- Information extraction from English and German texts based on predicate logic☆138Updated 2 years ago
- ☆30Updated 3 years ago
- Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18☆169Updated 3 years ago
- A Flexible Deep Learning Approach to Fuzzy String Matching☆147Updated 10 months ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- A machine learning tool for fishing entities☆265Updated 3 months ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆74Updated 2 months ago
- Blazing fast topic modelling for short texts.☆33Updated last month
- Language detection using Spacy and Fasttext☆57Updated last year
- A Named-Entity Recogniser based on Grobid.☆54Updated 3 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆171Updated 2 months ago
- Multi-Langauge Identification☆28Updated last year
- Train a model, and detect gibberish strings with it.☆64Updated 3 years ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆69Updated 4 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Record Linkage ToolKit (Find and link entities)☆110Updated 2 years ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated last week
- An efficient simhash implementation for python☆126Updated 5 years ago
- ☆71Updated 2 years ago
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆219Updated 7 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆138Updated last month
- Python text processing, pattern matching, and NLP framework☆66Updated 2 years ago