larsga / DukeLinks
Duke is a fast and flexible deduplication engine written in Java
☆626Updated 2 years ago
Alternatives and similar repositories for Duke
Users that are interested in Duke are comparing it to the libraries listed below
Sorting:
- Elasticsearch entity resolution plugin based on Duke☆209Updated 5 years ago
- A java library for stored queries☆378Updated 2 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆283Updated 7 years ago
- Data Integration Graph☆207Updated 7 years ago
- Solr query parser plugin that performs proper query-time synonym expansion.☆150Updated 4 years ago
- Banana for Solr - A Port of Kibana☆672Updated 5 months ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆382Updated 3 years ago
- Generates more or less realistic log data for testing simple aggregation queries.☆263Updated 2 years ago
- Elasticsearch Index Termlist☆118Updated 6 years ago
- Query preprocessor for Java-based search engines (Querqy Core and Lucene implementation)☆189Updated last week
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆276Updated 3 years ago
- (deprecated) High performance Elasticsearch percolator☆47Updated 6 years ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆127Updated 10 years ago
- ☆92Updated 10 years ago
- A bunch of fancy soft string matching routines, with some accompanying datasets☆56Updated 8 years ago
- Dice Solr Plugins from Simon Hughes Dice.com☆88Updated 4 years ago
- Fabric-based framework for deploying and managing SolrCloud clusters in the cloud.☆90Updated 6 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- Browser-driven explorer for lucene indexes☆74Updated 4 years ago
- A text tagger based on Lucene / Solr, using FST technology☆177Updated 2 years ago
- Entity Extraction Text Processor☆149Updated 2 years ago
- command line tool for Apache Lucene☆164Updated last week
- An open-source, vendor-neutral data context service.☆161Updated 7 years ago
- a pure javascript frontend for ElasticSearch search indices.☆80Updated 7 years ago
- TinkerPop 3 implementation on Elasticsearch backend☆70Updated 10 years ago
- Create custom user experiences for your Fusion-powered apps.☆37Updated 5 years ago
- REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models☆589Updated last month
- Carrot2 plugin for ElasticSearch☆294Updated 3 years ago
- A plugin for language detection in Elasticsearch using Nakatani Shuyo's language detector☆252Updated 8 years ago
- Similarity or Distance Metrics, e.g. Levenshtein, for Java☆358Updated 4 years ago