dedupeio / dedupeLinks
A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
☆4,381Updated 2 months ago
Alternatives and similar repositories for dedupe
Users that are interested in dedupe are comparing it to the libraries listed below
Sorting:
- Examples for using the dedupe library☆415Updated last year
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,025Updated last year
- a python library for parsing unstructured United States address strings into address components☆1,595Updated 2 months ago
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,154Updated this week
- Command line tool for deduplicating CSV files☆430Updated 5 years ago
- NLP, before and after spaCy☆2,230Updated 2 years ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,199Updated 2 months ago
- a python library for parsing unstructured western names into name components.☆608Updated 4 months ago
- Fuzzy String Matching in Python☆9,260Updated 2 years ago
- Python library for interactive topic model visualization. Port of the R LDAvis package.☆1,840Updated last year
- Multilingual text (NLP) processing toolkit☆2,351Updated last year
- Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends☆1,733Updated this week
- 🦆 Contextually-keyed word vectors☆1,660Updated 5 months ago
- Python bindings to libpostal for fast international address parsing/normalization☆844Updated 7 months ago
- Beautiful visualizations of how language differs among document types.☆2,314Updated 5 months ago
- What's in your data? Extract schema, statistics and entities from datasets☆1,518Updated last week
- Full text geoparsing as a Python library☆752Updated 4 years ago
- A list of free data matching and record linkage software.☆392Updated last year
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,519Updated 10 months ago
- A system for quickly generating training data with weak supervision☆5,923Updated last year
- Python Extract Transform and Load Tables of Data☆1,289Updated last month
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow☆2,081Updated last year
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆1,277Updated 4 years ago
- Extract Transform Load for Python 3.5+☆1,603Updated 2 years ago
- sqldf for pandas☆1,353Updated last year
- ☆3,169Updated 3 years ago
- Company Name Processor written in Python☆341Updated last year
- Quickly and accurately render even the largest data.☆3,459Updated last week
- A simple Python module for parsing human names into their individual components☆688Updated last year
- NumPy and Pandas interface to Big Data☆3,200Updated 2 years ago