hybridtheory / floc-simhashLinks
A fast python implementation of the SimHash algorithm.
☆27Updated 4 years ago
Alternatives and similar repositories for floc-simhash
Users that are interested in floc-simhash are comparing it to the libraries listed below
Sorting:
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆292Updated 2 years ago
- Abydos NLP/IR library for Python☆193Updated 3 years ago
- ☆68Updated 3 years ago
- Fuzzy matching and more functionality for spaCy.☆259Updated last year
- An efficient simhash implementation for python☆127Updated 6 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆169Updated 3 years ago
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆142Updated last year
- Blazing fast topic modelling for short texts.☆34Updated this week
- Record Linkage ToolKit (Find and link entities)☆111Updated 2 years ago
- Information extraction from English and German texts based on predicate logic☆139Updated 2 years ago
- A machine learning tool for fishing entities☆267Updated 7 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆155Updated 2 years ago
- Faster, modernized fork of the language identification tool langid.py☆61Updated last year
- Train a model, and detect gibberish strings with it.☆67Updated 3 years ago
- 📂 Additional lookup tables and data resources for spaCy☆113Updated 7 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆142Updated 2 months ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆195Updated last week
- ☆70Updated 3 years ago
- ☆30Updated 3 years ago
- Boolean text search in Python☆46Updated 6 months ago
- Sentence transformers models for SpaCy☆109Updated 2 years ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆130Updated 2 weeks ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆59Updated 9 months ago
- Find strings/words in text; convenience and C speed☆126Updated 3 years ago
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- Detect and visualize text reuse☆119Updated last year
- An index data structure for approximate string search.☆23Updated 6 years ago
- Extract text from HTML☆134Updated 5 years ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated last month
- A Flexible Deep Learning Approach to Fuzzy String Matching☆149Updated last year