casics / nostril
Nostril: Nonsense String Evaluator
☆190Updated 2 years ago
Related projects: ⓘ
- A small program to detect gibberish using a Markov Chain☆597Updated 7 months ago
- Compare html similarity using structural and style metrics☆209Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆148Updated last year
- Python wrapper for ssdeep fuzzy hashing library☆152Updated 2 years ago
- Tokenizer for raw mails☆367Updated 5 months ago
- ☆159Updated 3 months ago
- A lucene query parser generating ElasticSearch queries and more !☆188Updated 2 weeks ago
- Train a model, and detect gibberish strings with it.☆59Updated 2 years ago
- 🐍 A CPython extension for the Hyperscan regular expression matching library.☆165Updated 6 months ago
- Simple heuristic for measuring web page similarity (& data set)☆89Updated 6 years ago
- Python wrapper for RE2☆98Updated last week
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆364Updated last year
- Parse natural language time expressions in python☆131Updated last year
- Find strings/words in text; convenience and C speed☆125Updated 2 years ago
- Fuzzy matching and more functionality for spaCy.☆249Updated 2 months ago
- Ultimate Website Sitemap Parser☆178Updated last year
- Textpipe: clean and extract metadata from text☆300Updated 3 years ago
- An efficient simhash implementation for python☆124Updated 4 years ago
- Abydos NLP/IR library for Python☆180Updated last year
- URL normalization for Python☆94Updated 2 years ago
- A python utility for downloading Common Crawl data☆220Updated last year
- Super Fast String Matching in Python☆362Updated 4 months ago
- Library for unit extraction - fork of quantulum for python3☆134Updated 2 months ago
- 🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)☆425Updated 2 months ago
- Package to facilitate URL clustering☆68Updated 8 years ago
- Locality-sensitive hashing algorithm for text similarity comparisons☆58Updated 2 years ago
- Pure python Aho-Corasick library.☆209Updated last year
- Extracts the top level domain (TLD) from the URL given.☆177Updated last year
- Character-based word embeddings model based on RNN for handling real world texts☆172Updated 11 months ago
- Python port of Boilerpipe library☆81Updated last month