larsga/Duke

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/larsga/Duke)

larsga / Duke

Duke is a fast and flexible deduplication engine written in Java

☆622

Alternatives and similar repositories for Duke

Users that are interested in Duke are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

YannBrrd / elasticsearch-entity-resolution
View on GitHub
Elasticsearch entity resolution plugin based on Duke
☆210May 27, 2020Updated 6 years ago
dedupeio / dedupe
View on GitHub
A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
☆4,487Jul 29, 2025Updated last year
drangons / entity_resolution_spark
View on GitHub
Collection of some algorithms for entity resolution
☆28Sep 7, 2015Updated 10 years ago
scify / JedAIToolkit
View on GitHub
An open source, high scalability toolkit in Java for Entity Resolution.
☆226Jul 12, 2025Updated last year
J535D165 / recordlinkage
View on GitHub
A powerful and modular toolkit for record linkage and duplicate detection in Python
☆1,056Feb 21, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
carrot2 / elasticsearch-carrot2
View on GitHub
Carrot2 plugin for ElasticSearch
☆294Jan 2, 2023Updated 3 years ago
dedupeio / dedupe-geocoder
View on GitHub
Demonstration of how dedupe might be used as geocoder
☆17Jun 21, 2022Updated 4 years ago
visallo / vertexium
View on GitHub
High-security graph database
☆65Jun 30, 2022Updated 4 years ago
elsevierlabs-os / soda
View on GitHub
Solr Dictionary Annotator (Microservice for Spark)
☆71Feb 4, 2020Updated 6 years ago
dedupeio / dedupe-examples
View on GitHub
Examples for using the dedupe library
☆417Aug 10, 2024Updated last year
J535D165 / data-matching-software
View on GitHub
A list of free data matching and record linkage software.
☆406Feb 21, 2024Updated 2 years ago
TeamCohen / secondstring
View on GitHub
A bunch of fancy soft string matching routines, with some accompanying datasets
☆56Aug 10, 2017Updated 8 years ago
zouzias / spark-lucenerdd
View on GitHub
Spark RDD with Lucene's query and entity linkage capabilities
☆129Jun 23, 2026Updated last month
kxtells / vague-places
View on GitHub
☆14Dec 24, 2016Updated 9 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
codelibs / elasticsearch-taste
View on GitHub
Mahout Taste-based recommendation on Elasticsearch
☆336Oct 25, 2019Updated 6 years ago
Gaglia88 / sparker
View on GitHub
SparkER: an Entity Resolution framework for Apache Spark
☆67Mar 29, 2024Updated 2 years ago
kojisekig / KEA-lucene
View on GitHub
☆17Jul 15, 2016Updated 10 years ago
nexacenter / public-contracts
View on GitHub
☆10Apr 20, 2016Updated 10 years ago
carocad / clemence
View on GitHub
fast and incremental Levenshtein and LCS computation
☆20Jun 23, 2016Updated 10 years ago
TillerBurr / dash-query-builder
View on GitHub
Dash Component created from ukrbublik/react-awesome-query-builder
☆13Updated this week
OpenSextant / opensextant
View on GitHub
Deprecated Module: See Xponents or OpenSextantToolbox as active code base.
☆31Jul 24, 2013Updated 13 years ago
wikimedia / search-extra
View on GitHub
Github mirror of "search/extra" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for c…
☆56Jul 15, 2026Updated 2 weeks ago
bbc / rdfspace
View on GitHub
RDFSpace constructs a vector space from any RDF dataset which can be used for computing similarities between resources in that dataset.
☆41Nov 8, 2013Updated 12 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Simmetrics / simmetrics
View on GitHub
Similarity or Distance Metrics, e.g. Levenshtein, for Java
☆360Aug 26, 2021Updated 4 years ago
okfn / pdcalc
View on GitHub
Public Domain Calculators - determine what is public domain and what's not.
☆16Jan 30, 2024Updated 2 years ago
idealista / tlsh
View on GitHub
Java port of TLSH (Trend Micro Locality Sensitive Hash)
☆25Apr 26, 2021Updated 5 years ago
pudo / ted
View on GitHub
Scraper for public public procurement data from the EU's Tenders Electronic Daily (TED)
☆20Dec 29, 2015Updated 10 years ago
elizabethsiegle / nba-stats-twilio-sms-bot
View on GitHub
Compare 2 basketball players by reading/comparing NBA stats in an Excel sheet.
☆11Aug 19, 2018Updated 7 years ago
datamade / probablepeople
View on GitHub
a python library for parsing unstructured western names into name components.
☆622May 15, 2025Updated last year
uzh / fox
View on GitHub
A framework for PSL inference.
☆22Nov 9, 2015Updated 10 years ago
datamade / parserator
View on GitHub
A toolkit for making domain-specific probabilistic parsers
☆812Sep 26, 2024Updated last year
hbase4s / hbase4s
View on GitHub
User-friendly HBase API for Scala
☆15Nov 20, 2020Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
wammar / bayesian-record-linkage
View on GitHub
variations of the record linkage model of Steorts et al. AISTATS 2014's "SMERED: A Bayesian Approach to Graphical Record Linkage and De-d…
☆26Mar 13, 2017Updated 9 years ago
OpenRefine / OpenRefine
View on GitHub
OpenRefine is a free, open source power tool for working with messy data and improving it
☆11,930Updated this week
TellMeFirst / tellmefirst
View on GitHub
TellMeFirst is a tool for classifying and enriching textual documents via Linked Open Data.
☆25Sep 1, 2022Updated 3 years ago
sirensolutions / kibi
View on GitHub
PLEASE READ: Kibi is now "Siren Investigate", part of the Siren Platform. This code repository is only provided to facilitate code revi…
☆502Jun 28, 2024Updated 2 years ago
walmartlabs / mupd8
View on GitHub
Muppet
☆128May 7, 2021Updated 5 years ago
JorenSix / TarsosLSH
View on GitHub
A Java library implementing practical nearest neighbour search algorithm for multidimensional vectors that operates in sublinear time. It…
☆201Jul 26, 2020Updated 6 years ago
OpenGravestones / OpenGravestones
View on GitHub
A project to provide open burial data built on open standards.
☆19Oct 2, 2015Updated 10 years ago