J535D165 / recordlinkage
A powerful and modular toolkit for record linkage and duplicate detection in Python
☆997Updated last year
Alternatives and similar repositories for recordlinkage:
Users that are interested in recordlinkage are comparing it to the libraries listed below
- A list of free data matching and record linkage software.☆378Updated last year
- Examples for using the dedupe library☆410Updated 8 months ago
- Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends☆1,542Updated this week
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,263Updated 4 months ago
- Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4☆283Updated 2 years ago
- Super Fast String Matching in Python☆367Updated 3 weeks ago
- Simplifies use of the Dedupe library via Pandas☆135Updated 2 years ago
- Python package for performing Entity and Text Matching using Deep Learning.☆586Updated 9 months ago
- ☆189Updated 10 months ago
- Python bindings to libpostal for fast international address parsing/normalization☆803Updated 2 months ago
- Company Name Processor written in Python☆336Updated 10 months ago
- Clean APIs for data cleaning. Python implementation of R package Janitor☆1,410Updated this week
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,118Updated last week
- Clean personally identifiable information from dirty dirty text.☆405Updated last year
- Fuzzy string matching, grouping, and evaluation.☆758Updated last month
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,502Updated 4 months ago
- Python package to accelerate the sparse matrix multiplication and top-n similarity selection☆404Updated last week
- Easy pipelines for pandas DataFrames.☆719Updated 5 months ago
- 📛 Fuzzy Name Matching with Machine Learning☆264Updated 9 months ago
- Resources for tackling record linkage / deduplication / data matching problems☆122Updated last year
- Clean US addresses following USPS pub 28 and RESO guidelines☆214Updated last year
- A spaCy pipeline and model for NLP on unstructured legal text.☆648Updated 8 months ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆500Updated 2 months ago
- A library for defensive data analysis.☆500Updated 5 years ago
- Writes the Singer format from Python☆556Updated 2 weeks ago
- Natural Intelligence is still a pretty good idea.☆807Updated 8 months ago
- Full text geoparsing as a Python library☆747Updated 3 years ago
- Data Analysis Baseline Library☆726Updated 3 months ago
- Scalable identity resolution, entity resolution, data mastering and deduplication using ML☆1,009Updated this week
- Test-Driven Data Analysis Functions☆298Updated this week