domanchi / gibberish-detector
Train a model, and detect gibberish strings with it.
☆61Updated 3 years ago
Alternatives and similar repositories for gibberish-detector:
Users that are interested in gibberish-detector are comparing it to the libraries listed below
- An open-source package for python to clean raw text data☆69Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆151Updated last year
- Script for downloading GitHub.☆93Updated 9 months ago
- Fuzzy matching and more functionality for spaCy.☆256Updated 9 months ago
- Python package for deduplication/entity resolution using active learning☆78Updated 8 months ago
- Language detection using Spacy and Fasttext☆55Updated last year
- ☆69Updated 3 years ago
- ☆43Updated 2 years ago
- Build and upload fastText Python wheels to PyPI☆23Updated last year
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆66Updated 4 years ago
- Information extraction from English and German texts based on predicate logic☆135Updated last year
- ☆46Updated 2 years ago
- A fully customisable language detection pipeline for spaCy☆92Updated 5 years ago
- Google USE (Universal Sentence Encoder) for spaCy☆184Updated 2 years ago
- Multi-Langauge Identification☆28Updated 9 months ago
- Targetted language identifier, based on FastText and Hunspell.☆34Updated 2 months ago
- ☆169Updated 3 weeks ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆137Updated 3 months ago
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆61Updated last year
- Label data using HuggingFace's transformers and automatically get a prediction service☆185Updated last year
- Python package that offers text scrubbing functionality, providing building blocks for string cleaning as well as normalizing geographica…☆22Updated 8 months ago
- Open source library for few shot NLP☆78Updated last year
- Pythonic search engine based on PyLucene.☆125Updated 5 months ago
- A robust web archive analytics toolkit☆103Updated 3 weeks ago
- 🚂 Fine-tune OpenAI models for text classification, question answering, and more☆16Updated last year
- A python package to simulate typographical errors.☆34Updated last year
- Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data☆102Updated 3 years ago
- Simply, faster, sentence-transformers☆141Updated 8 months ago
- ☆67Updated 4 months ago