dedupeio / dedupeLinks
A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
☆4,299Updated 6 months ago
Alternatives and similar repositories for dedupe
Users that are interested in dedupe are comparing it to the libraries listed below
Sorting:
- Examples for using the dedupe library☆412Updated 9 months ago
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,007Updated last year
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,137Updated 2 weeks ago
- Command line tool for deduplicating CSV files☆421Updated 5 years ago
- a python library for parsing unstructured United States address strings into address components☆1,571Updated 2 weeks ago
- NLP, before and after spaCy☆2,225Updated last year
- a python library for parsing unstructured western names into name components.☆607Updated 2 weeks ago
- A list of free data matching and record linkage software.☆383Updated last year
- Utils for streaming large files (S3, HDFS, gzip, bz2...)☆3,319Updated 2 weeks ago
- PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph a…☆2,260Updated this week
- 📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.☆3,466Updated last month
- DeepDive☆1,965Updated 2 years ago
- A collection of common regular expressions bundled with an easy to use interface.☆1,576Updated 2 years ago
- A toolkit for making domain-specific probabilistic parsers☆802Updated 8 months ago
- extract text from any document. no muss. no fuss.☆4,145Updated 5 months ago
- A next-generation curated knowledge sharing platform for data scientists and other technical professions.☆5,520Updated 8 months ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,175Updated 10 months ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆1,272Updated 3 years ago
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow☆2,080Updated last year
- Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data☆749Updated last month
- Beautiful visualizations of how language differs among document types.☆2,302Updated last month
- A curated list of awesome ETL frameworks, libraries, and software.☆3,405Updated 10 months ago
- 🦆 Contextually-keyed word vectors☆1,653Updated last month
- Fuzzy String Matching in Python☆9,257Updated 2 years ago
- Extract Transform Load for Python 3.5+☆1,591Updated 2 years ago
- sqldf for pandas☆1,348Updated 10 months ago
- Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.☆8,817Updated 11 months ago
- A Python data analysis library that is optimized for humans instead of machines.☆1,181Updated 3 months ago
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,588Updated last month
- python parser for human readable dates☆2,663Updated this week