domanchi / gibberish-detector
Train a model, and detect gibberish strings with it.
☆59Updated 2 years ago
Related projects: ⓘ
- NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, …☆75Updated 6 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆61Updated 6 months ago
- An open-source package for python to clean raw text data☆68Updated last year
- Python package for deduplication/entity resolution using active learning☆77Updated 3 weeks ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆66Updated 2 weeks ago
- ☆65Updated 2 years ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆99Updated 4 months ago
- Find strings/words in text; convenience and C speed☆125Updated 2 years ago
- 80x faster and 95% accurate language identification with Fasttext☆131Updated 7 months ago
- Information extraction from English and German texts based on predicate logic☆133Updated last year
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆113Updated 2 weeks ago
- Few-shot Named Entity Recognition☆121Updated 2 years ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆63Updated 3 years ago
- Language detection using Spacy and Fasttext☆53Updated 9 months ago
- Fuzzy matching and more functionality for spaCy.☆249Updated 2 months ago
- spaCy entry points for Curated Transformers☆23Updated 2 weeks ago
- Sentence transformers models for SpaCy☆104Updated last year
- This repository provides various Python methods for finding and aggregating synonyms for an individual word or a list of words.☆32Updated last year
- 🧪 Cutting-edge experimental spaCy components and features☆94Updated 4 months ago
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆77Updated this week
- ☆41Updated last year
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further lang…☆112Updated 4 months ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆151Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆149Updated 3 months ago
- Simply, faster, sentence-transformers☆127Updated 3 weeks ago
- A python library for extracting text from PDFs without losing the formatting of the PDF content.☆72Updated 2 years ago
- Google USE (Universal Sentence Encoder) for spaCy☆176Updated last year
- LLM prompt language based on Jinja☆52Updated 2 weeks ago
- Creating class-based TF-IDF matrices☆81Updated last year
- This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.☆115Updated 3 months ago