sybrenjansen / text-scrubber
Python package that offers text scrubbing functionality, providing building blocks for string cleaning as well as normalizing geographical text (countries/states/cities)
☆22Updated 8 months ago
Alternatives and similar repositories for text-scrubber:
Users that are interested in text-scrubber are comparing it to the libraries listed below
- ☆30Updated 2 years ago
- Versatile Metrics Collection for Python☆19Updated last year
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- A simple and streamlined Python script to extract and filter links from a remote HTML resource.☆24Updated 3 months ago
- Language detection using Spacy and Fasttext☆55Updated last year
- A graph query engine☆16Updated 3 weeks ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- ☆70Updated 2 years ago
- ☆69Updated 3 years ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆35Updated last year
- Load embeddings and featurize your sentences.☆28Updated 6 months ago
- Declarative layer for your database.☆37Updated 2 years ago
- A simple library for training named entity recognition model from partially annotated data☆23Updated last year
- Templated docstrings for Python classes☆16Updated last year
- A utility for labeling clusters of text data.☆28Updated 3 years ago
- Python package for deduplication/entity resolution using active learning☆79Updated 8 months ago
- Elemental makes Selenium automation faster and easier.☆36Updated last year
- 🌸 Train floret vectors☆18Updated 2 years ago
- Scalable String Similarity Joins in Python☆39Updated 9 months ago
- A fully customisable language detection pipeline for spaCy☆92Updated 6 years ago
- An open-source NLP library: fast text cleaning and preprocessing☆23Updated 3 years ago
- Custom Python functions for working with SQLite FTS4☆22Updated 2 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Updated last year
- A web application tagging and retrieval of arguments in text☆28Updated 2 years ago
- The most basic Text::Unidecode port (licensed under Artistic License or GPL or GPLv2+ - choose whatever you want)☆66Updated 2 years ago
- Neural Elastic Inference and Search☆19Updated 5 years ago
- AsyncIO serving for data science models☆24Updated 2 years ago
- ipython + REPL + coroutines - suffering☆19Updated 8 months ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 7 months ago
- A maximum-strength name parser for record linkage.☆37Updated this week