domanchi / gibberish-detectorLinks
Train a model, and detect gibberish strings with it.
☆62Updated 3 years ago
Alternatives and similar repositories for gibberish-detector
Users that are interested in gibberish-detector are comparing it to the libraries listed below
Sorting:
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆109Updated last year
- This repository provides various Python methods for finding and aggregating synonyms for an individual word or a list of words.☆33Updated 2 years ago
- ☆47Updated 2 years ago
- ☆86Updated 2 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- ☆43Updated 2 years ago
- Legal document classification with EuroVoc descriptors on 22 languages.☆26Updated 2 years ago
- Index Common Crawl archives in tabular format☆122Updated last month
- Pythonic search engine based on PyLucene.☆128Updated 7 months ago
- Common crawl extractor☆76Updated last year
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆81Updated last year
- Language detection using Spacy and Fasttext☆55Updated last year
- NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, …☆82Updated 7 months ago
- ☆69Updated 3 years ago
- A python package to simulate typographical errors.☆35Updated last year
- Entity linking evaluation and analysis tool☆23Updated 2 months ago
- ☆171Updated 3 months ago
- 🔎 A Prodigy plugin for evaluating spaCy pipelines☆13Updated last year
- This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-s…☆216Updated 5 months ago
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.☆45Updated 6 years ago
- Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).☆71Updated 10 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆152Updated 2 years ago
- An open-source package for python to clean raw text data☆70Updated last year
- ☆78Updated last year
- Homoglyphs: get similar letters, convert to ASCII, detect possible languages and UTF-8 group.☆82Updated 4 years ago
- Named entity recognition for the legal domain☆42Updated 4 years ago
- A fully customisable language detection pipeline for spaCy☆93Updated 6 years ago
- Vespa application making an index of the CORD-19 dataset.☆39Updated 5 months ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- Toolkit for domain-specific information retrieval experimentation☆18Updated 2 weeks ago