LeapBeyond / scrubadub
Clean personally identifiable information from dirty dirty text.
☆401Updated last year
Alternatives and similar repositories for scrubadub:
Users that are interested in scrubadub are comparing it to the libraries listed below
- a python library for parsing unstructured western names into name components.☆599Updated 3 months ago
- Group thousands of similar spreadsheet or database text entries in seconds☆156Updated last year
- 🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)☆449Updated 2 weeks ago
- A toolkit for making domain-specific probabilistic parsers☆798Updated 4 months ago
- A Python module to convert natural language numerics into ints and floats.☆225Updated 4 months ago
- Super lightweight function registries for your library☆176Updated 7 months ago
- Tools for test driven data-wrangling and data validation.☆295Updated 3 years ago
- Company Name Processor written in Python☆332Updated 8 months ago
- Easy pipelines for pandas DataFrames.☆718Updated 2 months ago
- Python bindings to libpostal for fast international address parsing/normalization☆778Updated 7 months ago
- A Python library for working with Table Schema.☆260Updated 2 months ago
- Test-Driven Data Analysis Functions☆296Updated last week
- Python address detector and parser☆206Updated last year
- Tutorial code and data for the entity resolution workshops.☆44Updated 9 years ago
- A library for defensive data analysis.☆500Updated 5 years ago
- Easy to use test framework for Jupyter Notebooks☆307Updated 2 years ago
- Fuzzy string matching, grouping, and evaluation.☆751Updated last month
- Tools for exploratory data analysis in Python☆644Updated last year
- Command line tool for deduplicating CSV files☆413Updated 4 years ago
- Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.☆513Updated 3 weeks ago
- Simplifies use of the Dedupe library via Pandas☆135Updated last year
- The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common fun…☆215Updated 3 years ago
- Smarter Manual Annotation for Resource-constrained collection of Training data☆224Updated last month
- Clean US addresses following USPS pub 28 and RESO guidelines☆208Updated last year
- Text Mining and Topic Modeling Toolkit for Python with parallel processing power☆191Updated last year
- Find dates inside text using Python and get back datetime objects☆639Updated 8 months ago
- Super Fast String Matching in Python☆363Updated this week
- Dataframe Integration with spaCy.☆103Updated 3 years ago
- A command line tool to easily add an ethics checklist to your data science projects.☆291Updated 6 months ago
- Fuzzy matching and more functionality for spaCy.☆255Updated 6 months ago