hybridtheory / floc-simhashLinks
A fast python implementation of the SimHash algorithm.
☆27Updated 3 years ago
Alternatives and similar repositories for floc-simhash
Users that are interested in floc-simhash are comparing it to the libraries listed below
Sorting:
- Abydos NLP/IR library for Python☆188Updated 2 years ago
- A comprehensive and scalable set of string tokenizers and similarity measures in Python☆140Updated last year
- ☆69Updated 3 years ago
- ☆30Updated 3 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆164Updated 2 years ago
- A machine learning tool for fishing entities☆263Updated 2 months ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago
- Information extraction from English and German texts based on predicate logic☆138Updated 2 years ago
- Fuzzy matching and more functionality for spaCy.☆256Updated last year
- Blazing fast topic modelling for short texts.☆33Updated last month
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents☆290Updated 2 years ago
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆218Updated 6 months ago
- Language detection using Spacy and Fasttext☆57Updated last year
- Train a model, and detect gibberish strings with it.☆64Updated 3 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆168Updated 2 months ago
- spaCy entry points for Curated Transformers☆32Updated 2 months ago
- 📂 Additional lookup tables and data resources for spaCy☆108Updated 2 months ago
- Multi-Langauge Identification☆28Updated last year
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆178Updated 7 months ago
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆125Updated last year
- DaCy: The State of the Art Danish NLP pipeline using SpaCy☆97Updated 7 months ago
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr☆21Updated 3 years ago
- Faster, modernized fork of the language identification tool langid.py☆56Updated 8 months ago
- Find strings/words in text; convenience and C speed☆126Updated 2 years ago
- Text tokenization and sentence segmentation (segtok v2)☆205Updated 3 years ago
- Boolean text search in Python☆45Updated last month
- 🔢 Work with static vector models☆28Updated 3 months ago