sybrenjansen / text-scrubber
Python package that offers text scrubbing functionality, providing building blocks for string cleaning as well as normalizing geographical text (countries/states/cities)
☆22Updated 6 months ago
Alternatives and similar repositories for text-scrubber:
Users that are interested in text-scrubber are comparing it to the libraries listed below
- ☆30Updated 2 years ago
- ☆68Updated 3 years ago
- Language detection using Spacy and Fasttext☆55Updated last year
- ☆70Updated 2 years ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- Versatile Metrics Collection for Python☆18Updated last year
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- A utility for labeling clusters of text data.☆28Updated 3 years ago
- Python package for deduplication/entity resolution using active learning☆76Updated 6 months ago
- ☆63Updated 3 months ago
- A simple and streamlined Python script to extract and filter links from a remote HTML resource.☆24Updated 2 months ago
- Bringing semantic search to Django. Integrates seemlessly with Django ORM.☆32Updated 5 months ago
- Elemental makes Selenium automation faster and easier.☆36Updated last year
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆67Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- Declarative layer for your database.☆37Updated 2 years ago
- Custom Python functions for working with SQLite FTS4☆22Updated 2 years ago
- Scalable String Similarity Joins in Python☆38Updated 8 months ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆33Updated 11 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated 10 months ago
- WASM-powered sandbox implementation of exec() for safely running dynamic Python code☆33Updated last year
- ipython + REPL + coroutines - suffering☆18Updated 6 months ago
- A maximum-strength name parser for record linkage.☆36Updated last month
- Multi-Langauge Identification☆29Updated 7 months ago
- 🌸 Train floret vectors☆18Updated last year
- AsyncIO serving for data science models☆24Updated 2 years ago
- Named entity recognition for the legal domain☆42Updated 3 years ago
- The NLP Bias Identification Toolkit☆36Updated last year
- An AI extension for IPython that makes it work like Cursor☆62Updated 2 months ago