domanchi / gibberish-detectorLinks
Train a model, and detect gibberish strings with it.
☆67Updated 3 years ago
Alternatives and similar repositories for gibberish-detector
Users that are interested in gibberish-detector are comparing it to the libraries listed below
Sorting:
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆149Updated last week
- Fast and robust date extraction from web pages, with Python or on the command-line☆141Updated last week
- 80x faster and 95% accurate language identification with Fasttext☆161Updated last year
- A python package to simulate typographical errors.☆38Updated last year
- Find strings/words in text; convenience and C speed☆127Updated 3 years ago
- A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)☆94Updated last month
- Compare html similarity using structural and style metrics☆214Updated 2 years ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆154Updated 2 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆110Updated last year
- A fast python implementation of the SimHash algorithm.☆27Updated 4 years ago
- An efficient algorithm for k-bounded (Damerau-)Levenshtein distance☆16Updated 7 years ago
- ☆69Updated 3 years ago
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K …☆83Updated 10 months ago
- Parse natural language time expressions in python☆131Updated 2 years ago
- A framework for converting natural language text inputs to corresponding Pandas, MongoDB, Kusto and Neo4j (Cypher) queries.☆91Updated last year
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.☆48Updated 6 years ago
- A python based HTML to text conversion library, command line client and Web service.☆323Updated 2 weeks ago
- 🖍️ Highlight text in documents☆109Updated 6 months ago
- ☆175Updated 7 months ago
- Scripts for Medium articles☆62Updated last year
- 🐍 A CPython extension for the Hyperscan regular expression matching library.☆186Updated 2 weeks ago
- Simply, faster, sentence-transformers☆143Updated last year
- Python package for deduplication/entity resolution using active learning☆82Updated last year
- Multi-Langauge Identification☆28Updated last year
- ☆43Updated 2 years ago
- 🔢 Work with static vector models☆31Updated 6 months ago
- Python API for https://vespa.ai, the open big data serving engine☆146Updated last week
- Fuzzy matching and more functionality for spaCy.☆258Updated last year
- Simple heuristic for measuring web page similarity (& data set)☆91Updated 7 years ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆62Updated this week