hybridtheory / floc-simhashLinks
A fast python implementation of the SimHash algorithm.
☆27Updated 4 years ago
Alternatives and similar repositories for floc-simhash
Users that are interested in floc-simhash are comparing it to the libraries listed below
Sorting:
- ☆68Updated 3 years ago
- Fuzzy matching and more functionality for spaCy.☆259Updated last year
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆169Updated 3 years ago
- ☆30Updated 3 years ago
- Boolean text search in Python☆46Updated 6 months ago
- Abydos NLP/IR library for Python☆193Updated 3 years ago
- A machine learning tool for fishing entities☆267Updated 7 months ago
- Detect and visualize text reuse☆119Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆155Updated 2 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 5 years ago
- Find strings/words in text; convenience and C speed☆126Updated 3 years ago
- Language detection using Spacy and Fasttext☆57Updated 2 years ago
- Parse natural language time expressions in python☆131Updated 3 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆143Updated 2 months ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆95Updated 2 years ago
- ☆70Updated 3 years ago
- Sentence transformers models for SpaCy☆109Updated 2 years ago
- Weighted Levenshtein library☆113Updated last month
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆195Updated last week
- Faster, modernized fork of the language identification tool langid.py☆61Updated last year
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆220Updated 11 months ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆62Updated last week
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18☆170Updated 4 years ago
- communication sur le moteur de pseudonymisation de la Cour de Cassation☆18Updated 2 years ago
- A Flexible Deep Learning Approach to Fuzzy String Matching☆149Updated last year
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆292Updated 2 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆182Updated 7 months ago
- 🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy☆331Updated 8 months ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆130Updated 2 weeks ago