prasanthg3 / cleantext
An open-source package for python to clean raw text data
β69Updated last year
Alternatives and similar repositories for cleantext:
Users that are interested in cleantext are comparing it to the libraries listed below
- NeatText a simple NLP package for cleaning textual data and text preprocessingβ71Updated last year
- π§ͺ Cutting-edge experimental spaCy components and featuresβ96Updated 9 months ago
- Dataframe Integration with spaCy.β103Updated 3 years ago
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ63Updated 10 months ago
- A spaCy custom component that extracts and normalizes temporal expressionsβ54Updated 2 years ago
- Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.β117Updated 10 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.β151Updated 8 months ago
- β54Updated last year
- spaCy match and replace, maintaining conjugationβ35Updated 2 years ago
- Python package for deduplication/entity resolution using active learningβ76Updated 5 months ago
- Information extraction from English and German texts based on predicate logicβ135Updated last year
- Language detection using Spacy and Fasttextβ55Updated last year
- Bag of, not words, but tricks!β68Updated last year
- A monolingual and cross-lingual meta-embedding generation and evaluation frameworkβ80Updated 2 years ago
- A Python library aimed at dissecting and augmenting NER training data.β58Updated last year
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidataβ157Updated 2 years ago
- Blazing fast language detection using fastText modelβ23Updated 2 years ago
- βοΈ Parallel and distributed training with spaCy and Rayβ53Updated last year
- Creating class-based TF-IDF matricesβ82Updated 2 years ago
- A python package to simulate typographical errors.β31Updated last year
- Sentence transformers models for SpaCyβ107Updated last year
- β68Updated 2 years ago
- Easy PDF to text to spaCy text extraction in Python.β38Updated 4 months ago
- Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further langβ¦β120Updated 9 months ago
- A data labelling tool based on Streamlit.β23Updated 3 years ago
- β42Updated last year
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality β¦β106Updated 11 months ago
- Streamlit demo app to demonstrate the features of transformers interpret with multiple models.β25Updated 3 years ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.β60Updated this week
- A spaCy wrapper for DBpedia Spotlightβ107Updated last year