domanchi / gibberish-detectorLinks
Train a model, and detect gibberish strings with it.
☆62Updated 3 years ago
Alternatives and similar repositories for gibberish-detector
Users that are interested in gibberish-detector are comparing it to the libraries listed below
Sorting:
- spaCy entry points for Curated Transformers☆31Updated last week
- 80x faster and 95% accurate language identification with Fasttext☆155Updated last year
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆109Updated last year
- Python package for deduplication/entity resolution using active learning☆80Updated 9 months ago
- Targetted language identifier, based on FastText and Hunspell.☆34Updated 3 months ago
- ☆69Updated 3 years ago
- ☆43Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- A python package to simulate typographical errors.☆35Updated last year
- This repository provides various Python methods for finding and aggregating synonyms for an individual word or a list of words.☆33Updated 2 years ago
- Multi-Langauge Identification☆28Updated 10 months ago
- RaKUn 2.0 - A fast keyword detection algorithm☆67Updated last month
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆73Updated last month
- Simply, faster, sentence-transformers☆142Updated 9 months ago
- Notebooks for training universal 0-shot classifiers on many different tasks☆127Updated 5 months ago
- Index Common Crawl archives in tabular format☆120Updated 3 weeks ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆160Updated 3 weeks ago
- Measure the readability of a given text using surface characteristics☆79Updated 4 months ago
- Abydos NLP/IR library for Python☆186Updated 2 years ago
- A CLI tool for managing OpenAI batch processing jobs with ease.☆36Updated last month
- Few-shot Named Entity Recognition☆123Updated 3 years ago
- A robust web archive analytics toolkit☆108Updated 2 months ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆135Updated 6 months ago
- Fuzzy matching and more functionality for spaCy.☆256Updated 11 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆129Updated 5 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆79Updated last year
- Language detection using Spacy and Fasttext☆55Updated last year
- An open-source package for python to clean raw text data☆70Updated last year
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.☆45Updated 6 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 2 months ago