domanchi / gibberish-detector
Train a model, and detect gibberish strings with it.
☆60Updated 3 years ago
Alternatives and similar repositories for gibberish-detector:
Users that are interested in gibberish-detector are comparing it to the libraries listed below
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆104Updated 9 months ago
- ☆68Updated 2 years ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆33Updated 10 months ago
- A python utility for downloading Common Crawl data☆232Updated last year
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆68Updated 2 weeks ago
- spaCy entry points for Curated Transformers☆26Updated 4 months ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆134Updated last month
- Language detection using Spacy and Fasttext☆55Updated last year
- Fast and robust date extraction from web pages, with Python or on the command-line☆122Updated last month
- Summarize. is a Streamlit application that performs automatic text summarization using both extractive and abstractive models.☆16Updated 3 years ago
- Python package for deduplication/entity resolution using active learning☆76Updated 5 months ago
- Find strings/words in text; convenience and C speed☆126Updated 2 years ago
- ☆42Updated last week
- Script for downloading GitHub.☆90Updated 7 months ago
- Automatically check mismatch between code and comments using AI and ML☆53Updated 3 years ago
- Multi-Langauge Identification☆29Updated 6 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 11 months ago
- This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.☆123Updated 8 months ago
- An open-source package for python to clean raw text data☆69Updated last year
- ☆168Updated 8 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆149Updated last year
- Detecting gibberish as a type of sentiment analysis with GPT2☆24Updated 4 years ago
- Build and upload fastText Python wheels to PyPI☆23Updated last year
- 🔎 A Prodigy plugin for evaluating spaCy pipelines☆13Updated 10 months ago
- NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, …☆78Updated 2 months ago
- ☆63Updated 2 months ago
- Find and fix bugs in natural language machine learning models using adaptive testing.☆181Updated 9 months ago
- Python port of Boilerpipe library☆86Updated 5 months ago
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆193Updated 2 years ago
- A fully customisable language detection pipeline for spaCy☆92Updated 5 years ago