Duke is a fast and flexible deduplication engine written in Java
☆626Oct 11, 2023Updated 2 years ago
Alternatives and similar repositories for Duke
Users that are interested in Duke are comparing it to the libraries listed below
Sorting:
- Elasticsearch entity resolution plugin based on Duke☆209May 27, 2020Updated 5 years ago
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,440Jul 29, 2025Updated 7 months ago
- Entity resolution for Elasticsearch.☆166Mar 1, 2026Updated last week
- An open source, high scalability toolkit in Java for Entity Resolution.☆222Jul 12, 2025Updated 7 months ago
- The Berkeley Entity Resolution System jointly solves the problems of named entity recognition, coreference resolution, and entity linking…☆187Dec 7, 2019Updated 6 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Feb 4, 2020Updated 6 years ago
- Carrot2 plugin for ElasticSearch☆294Jan 2, 2023Updated 3 years ago
- Resources for tackling record linkage / deduplication / data matching problems☆126Feb 22, 2024Updated 2 years ago
- Distributed restful text mining.☆21Jan 19, 2016Updated 10 years ago
- Examples for using the dedupe library☆419Aug 10, 2024Updated last year
- A list of free data matching and record linkage software.☆401Feb 21, 2024Updated 2 years ago
- ☆14Dec 24, 2016Updated 9 years ago
- RDFSpace constructs a vector space from any RDF dataset which can be used for computing similarities between resources in that dataset.☆41Nov 8, 2013Updated 12 years ago
- ☆17Jul 15, 2016Updated 9 years ago
- A version of Stan Salvador and Philip Chan's "FastDTW" dynamic time warping implementation modified for use as a library in production ap…☆34May 10, 2012Updated 13 years ago
- The Text Analysis with Amazon Comprehend and Amazon OpenSearch Service solution is an automated reference implementation that deploys a c…☆34Oct 23, 2024Updated last year
- Deprecated Module: See Xponents or OpenSextantToolbox as active code base.☆31Jul 24, 2013Updated 12 years ago
- Command line tool for deduplicating CSV files☆434Mar 31, 2020Updated 5 years ago
- PLEASE READ: Kibi is now "Siren Investigate", part of the Siren Platform. This code repository is only provided to facilitate code revi…☆499Jun 28, 2024Updated last year
- Unix look utility analog which is blazingly fast and works with big files☆18May 27, 2022Updated 3 years ago
- Actionable memory analysis for JVM languages☆17Dec 29, 2017Updated 8 years ago
- Mahout Taste-based recommendation on Elasticsearch☆335Oct 25, 2019Updated 6 years ago
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆276Nov 5, 2022Updated 3 years ago
- SparkER: an Entity Resolution framework for Apache Spark☆65Mar 29, 2024Updated last year
- A general purpose graph library☆11Jun 21, 2018Updated 7 years ago
- Secure REST service to index, search, retrieve and aggregate content from heterogeneous sources.☆20Oct 3, 2024Updated last year
- A toolkit for making domain-specific probabilistic parsers☆806Sep 26, 2024Updated last year
- Document clustering based on Latent Semantic Analysis☆96Apr 29, 2010Updated 15 years ago
- A library for accelerating data compression using Intel® QAT.☆21Feb 26, 2026Updated last week
- GraphAware Framework Module for Integrating Neo4j with Elasticsearch☆266May 5, 2021Updated 4 years ago
- Documentation, examples and utilities for Flatlline, BigML's dataset transformation and generation language☆26Mar 18, 2025Updated 11 months ago
- A framework for PSL inference.☆21Nov 9, 2015Updated 10 years ago
- This library provides a Java library for H264 and MJPEG encoding and decoding as well as support for MP4.☆16Oct 3, 2025Updated 5 months ago
- Simplifying robust end-to-end machine learning on Apache Spark.☆475Apr 18, 2017Updated 8 years ago
- Shell script automation to support csv2rdf4lod converter☆112Dec 13, 2021Updated 4 years ago
- a python library for parsing unstructured western names into name components.☆616May 15, 2025Updated 9 months ago
- An RDF plugin for Solr☆114Jan 27, 2025Updated last year
- A java library for stored queries☆379Mar 8, 2023Updated 3 years ago
- A compact implementation of Dr. Askitis HatTrie☆80May 20, 2014Updated 11 years ago