Simplifies use of the Dedupe library via Pandas
☆135Mar 30, 2023Updated 3 years ago
Alternatives and similar repositories for pandas-dedupe
Users that are interested in pandas-dedupe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,473Jul 29, 2025Updated 10 months ago
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,052Feb 21, 2024Updated 2 years ago
- Examples for using the dedupe library☆417Aug 10, 2024Updated last year
- (Archived) A Python library for record linkage and deduplication.☆19Mar 19, 2024Updated 2 years ago
- Flow and transmission cost allocation in power systems☆17Jul 19, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A list of free data matching and record linkage software.☆406Feb 21, 2024Updated 2 years ago
- A browser user interface for manual labeling of record pairs.☆48Jun 23, 2023Updated 2 years ago
- Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4☆286Aug 9, 2022Updated 3 years ago
- Resources for tackling record linkage / deduplication / data matching problems☆127Feb 22, 2024Updated 2 years ago
- 나무위키, 위키피디아, 다음블로그, 티스토리, 유튜브, 네이트판 크롤러☆13Feb 20, 2026Updated 3 months ago
- 청와대 국민청원 데이터 아카이브☆15Aug 29, 2020Updated 5 years ago
- Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends☆2,192Jun 5, 2026Updated last week
- Command line tool for deduplicating CSV files☆434Mar 31, 2020Updated 6 years ago
- Fuzzy string matching, grouping, and evaluation.☆798Jul 10, 2025Updated 11 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A content-filtering bypass system developed specifically to allow access to trans-related resources on public networks (libraries, school…☆27Nov 15, 2014Updated 11 years ago
- Scripts to download the U.S. Department of Justice's National Caseload Data and load it into Amazon Athena for querying☆15May 22, 2023Updated 3 years ago
- R Evolved Generalized Software for Sampling Estimates and Errors in Surveys☆16Oct 3, 2025Updated 8 months ago
- PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolut…☆161Nov 18, 2022Updated 3 years ago
- Work for Mastering Large Datasets with Python☆20Dec 8, 2022Updated 3 years ago
- Pandas in black and white: a collection of opinionated pandas flashcards☆14Feb 15, 2019Updated 7 years ago
- Table Enforcer is my attempt to apply a sort of "test driven development" workflow to data cleaning and validation. A python package to f…☆19Feb 26, 2018Updated 8 years ago
- Dedupe/batch geocode addresses and venues around the world with libpostal☆84Nov 29, 2021Updated 4 years ago
- Source codes and experimental results of our scientific integrity verification system.☆17Oct 15, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- micro-library to produce a couple of basic, attractive, printable plots with matplotlib☆11Mar 4, 2018Updated 8 years ago
- Open modelling of European power systems in Python: a proof-of-concept☆41Jan 13, 2023Updated 3 years ago
- Snakemake with pytest example☆11Jul 18, 2020Updated 5 years ago
- 🍯 Sweet simple static site generator with Dune vibes☆14Feb 17, 2026Updated 3 months ago
- 🚀 Implementation of easy-to-use 3D parallelism based on Huggingface Transformers & Microsoft DeepSpeed☆31Feb 5, 2022Updated 4 years ago
- Library for unit extraction - fork of quantulum for python3☆148May 19, 2026Updated 3 weeks ago
- A pytorch implementation of "SuperTML: Two-Dimensional Word Embedding for the Precognition on Structured Tabular Data"☆29Jul 23, 2019Updated 6 years ago
- ☆13Dec 13, 2022Updated 3 years ago
- Python script for matching a list of messy addresses against a gazetteer using dedupe.☆64Mar 31, 2020Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A curated list of ML awesome frameworks & libraries for text data☆17Mar 14, 2023Updated 3 years ago
- NextGIS build of QGIS☆27May 14, 2026Updated 3 weeks ago
- Combination of the RapidFuzz library with Spacy PhraseMatcher☆11Sep 29, 2021Updated 4 years ago
- Simple API serving for Python ML models☆32Nov 22, 2022Updated 3 years ago
- Implementation of stop sequencer for Huggingface Transformers☆16Jun 6, 2023Updated 3 years ago
- Line of business tooling for VOIP services.☆11Updated this week
- Python script for processing DrugBank XML to MySQL-ready CSV files☆19Mar 6, 2017Updated 9 years ago