dedupeio/dedupe-examples

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dedupeio/dedupe-examples)

dedupeio / dedupe-examples

Examples for using the dedupe library

☆417

Alternatives and similar repositories for dedupe-examples

Users that are interested in dedupe-examples are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dedupeio / dedupe
View on GitHub
A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
☆4,487Jul 29, 2025Updated 11 months ago
dedupeio / csvdedupe
View on GitHub
Command line tool for deduplicating CSV files
☆435Mar 31, 2020Updated 6 years ago
J535D165 / data-matching-software
View on GitHub
A list of free data matching and record linkage software.
☆406Feb 21, 2024Updated 2 years ago
trevorprater / serf
View on GitHub
Stanford Entity-Resolution Framework
☆24Jun 23, 2018Updated 8 years ago
vintasoftware / deduplication-slides
View on GitHub
"1 + 1 = 1 or Record Deduplication with Python" Jupyter Notebook
☆84Dec 8, 2022Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Lyonk71 / pandas-dedupe
View on GitHub
Simplifies use of the Dedupe library via Pandas
☆137Mar 30, 2023Updated 3 years ago
dedupeio / pyhacrf
View on GitHub
Hidden alignment conditional random field for classifying string pairs.
☆24Jan 12, 2026Updated 6 months ago
aporia-records / APORIA-Works-Registration
View on GitHub
A PHP library for reading, writing and manipulating CISAC Common Works Registration (CWR) v2.1R7 and v2.2 files
☆16Sep 6, 2018Updated 7 years ago
weso / CWR-Validator
View on GitHub
Service for parsing and processing data from Common Works Registration (CWR) standard formats.
☆16Jul 31, 2015Updated 10 years ago
DistrictDataLabs / dedupe-examples
View on GitHub
Examples for using the dedupe library
☆10Feb 22, 2016Updated 10 years ago
iesl / learned-string-alignments
View on GitHub
Learning String Alignments for Entity Aliases
☆37Mar 21, 2019Updated 7 years ago
dedupeio / fuzzycategory
View on GitHub
Fuzzy Categorical Distances
☆14Mar 31, 2020Updated 6 years ago
scify / jedai-ui
View on GitHub
UI for JedAI Toolkit
☆17May 20, 2022Updated 4 years ago
datamade / usaddress
View on GitHub
a python library for parsing unstructured United States address strings into address components
☆1,631Aug 7, 2025Updated 11 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
gabfl / dbschema
View on GitHub
MySQL/PostgreSQL schema migrations made easy
☆16May 9, 2023Updated 3 years ago
RobinL / fuzzymatcher
View on GitHub
Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4
☆286Aug 9, 2022Updated 3 years ago
dice-group / LIMES
View on GitHub
Link Discovery Framework for Metric Spaces.
☆133Sep 16, 2025Updated 10 months ago
datamade / parserator
View on GitHub
A toolkit for making domain-specific probabilistic parsers
☆812Sep 26, 2024Updated last year
vphill / metadata_breakers
View on GitHub
Python script for breaking or atomizing OAI-PMH repositories into simpler text formats
☆26Sep 25, 2022Updated 3 years ago
OpenBibframe / bibframe-ontology
View on GitHub
A repository for reviewing and refining the LC bibframe ontology
☆11Mar 14, 2017Updated 9 years ago
dedupeio / doublemetaphone
View on GitHub
Python wrapper for a C++ Double Metaphone
☆15Jan 12, 2026Updated 6 months ago
datamade / probablepeople
View on GitHub
a python library for parsing unstructured western names into name components.
☆622May 15, 2025Updated last year
DistrictDataLabs / entity-resolution
View on GitHub
Tutorial code and data for the entity resolution workshops.
☆45Jul 15, 2015Updated 11 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
dedupeio / address-matching
View on GitHub
Python script for matching a list of messy addresses against a gazetteer using dedupe.
☆64Mar 31, 2020Updated 6 years ago
sqgly / SpotifyDataAnalysis
View on GitHub
This application scrapes data from the Spotify API using the lightweight python library Spotipy. The data is transformed and processed fo…
☆11Oct 17, 2019Updated 6 years ago
scify / JedAIToolkit
View on GitHub
An open source, high scalability toolkit in Java for Entity Resolution.
☆226Jul 12, 2025Updated last year
YannBrrd / elasticsearch-entity-resolution
View on GitHub
Elasticsearch entity resolution plugin based on Duke
☆210May 27, 2020Updated 6 years ago
ContinuumIO / topik
View on GitHub
A Topic Modeling toolbox
☆93Apr 26, 2016Updated 10 years ago
dedupeio / dedupe-geocoder
View on GitHub
Demonstration of how dedupe might be used as geocoder
☆17Jun 21, 2022Updated 4 years ago
Gaglia88 / sparker
View on GitHub
SparkER: an Entity Resolution framework for Apache Spark
☆67Mar 29, 2024Updated 2 years ago
jamesturk / jellyfish
View on GitHub
🪼 a python library for doing approximate and phonetic matching of strings.
☆2,227Updated this week
bradhackinen / nama
View on GitHub
Fast, flexible name matching for large datasets
☆71Aug 29, 2025Updated 10 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
anhaidgroup / py_entitymatching
View on GitHub
☆193May 29, 2024Updated 2 years ago
cmharlow / GetUrRecon
View on GitHub
All that entity matching, resolution, normalization, enhancement and reconciliation madness, but with a focus on data, not platforms.
☆24Feb 5, 2022Updated 4 years ago
ropeladder / record-linkage-resources
View on GitHub
Resources for tackling record linkage / deduplication / data matching problems
☆127Feb 22, 2024Updated 2 years ago
cjdd3b / car-datascience-toolkit
View on GitHub
Simple implementations of data science tools for use by newspaper reporters.
☆20Jun 5, 2012Updated 14 years ago
awslabs / sagemaker-graph-entity-resolution
View on GitHub
☆17May 3, 2024Updated 2 years ago
seatgeek / fuzzywuzzy
View on GitHub
Fuzzy String Matching in Python
☆9,262Feb 24, 2023Updated 3 years ago
ireapps / install-guides
View on GitHub
Install guides for IRE/NICAR conferences.
☆16Mar 16, 2018Updated 8 years ago