dedupeio / dedupe
A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
☆4,210Updated 2 months ago
Alternatives and similar repositories for dedupe:
Users that are interested in dedupe are comparing it to the libraries listed below
- Examples for using the dedupe library☆409Updated 5 months ago
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆977Updated 11 months ago
- A toolkit for making domain-specific probabilistic parsers☆798Updated 4 months ago
- Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends☆1,444Updated last week
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,085Updated 3 weeks ago
- a python library for parsing unstructured western names into name components.☆599Updated 3 months ago
- a python library for parsing unstructured United States address strings into address components☆1,545Updated 4 months ago
- Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow☆2,738Updated 3 years ago
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,492Updated last month
- NLP, before and after spaCy☆2,214Updated last year
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner☆2,562Updated 10 months ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,330Updated 3 months ago
- sqldf for pandas☆1,344Updated 6 months ago
- A system for quickly generating training data with weak supervision☆5,824Updated 8 months ago
- Lifetime value in Python☆1,456Updated 7 months ago
- A simple Python module for parsing human names into their individual components☆664Updated 8 months ago
- NumPy and Pandas interface to Big Data☆3,190Updated last year
- Fixes mojibake and other glitches in Unicode text, after the fact.☆3,848Updated 2 months ago
- Clean APIs for data cleaning. Python implementation of R package Janitor☆1,387Updated this week
- ☆3,157Updated 3 years ago
- Scalable identity resolution, entity resolution, data mastering and deduplication using ML☆975Updated this week
- Visualizations for machine learning datasets☆7,359Updated last year
- Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.☆8,762Updated 7 months ago
- Python Extract Transform and Load Tables of Data☆1,255Updated 8 months ago
- Data Migration for the Blaze Project☆1,003Updated 2 years ago
- the portable Python dataframe library☆5,466Updated this week
- Python package for performing Entity and Text Matching using Deep Learning.☆571Updated 7 months ago
- DeepDive☆1,960Updated 2 years ago
- Learning embeddings for classification, retrieval and ranking.☆3,951Updated 2 years ago
- Visual analysis and diagnostic tools to facilitate machine learning model selection.☆4,309Updated 4 months ago