larsga / Duke
Duke is a fast and flexible deduplication engine written in Java
☆621Updated last year
Alternatives and similar repositories for Duke
Users that are interested in Duke are comparing it to the libraries listed below
Sorting:
- Elasticsearch entity resolution plugin based on Duke☆210Updated 4 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆281Updated 7 years ago
- Entity resolution for Elasticsearch.☆159Updated 4 months ago
- Data Integration Graph☆206Updated 6 years ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆382Updated 2 years ago
- ☆184Updated 6 years ago
- Banana for Solr - A Port of Kibana☆670Updated 9 months ago
- A bunch of fancy soft string matching routines, with some accompanying datasets☆56Updated 7 years ago
- A java library for stored queries☆375Updated 2 years ago
- Java and REST APIs for working with time-representing tree in Neo4j☆208Updated 4 years ago
- Cassovary is a simple big graph processing library for the JVM☆1,048Updated 3 years ago
- Elasticsearch Index Termlist☆117Updated 6 years ago
- Query preprocessor for Java-based search engines (Querqy Core and Solr implementation)☆184Updated last week
- Browser-driven explorer for lucene indexes☆74Updated 3 years ago
- a pure javascript frontend for ElasticSearch search indices.☆79Updated 7 years ago
- Solr query parser plugin that performs proper query-time synonym expansion.☆150Updated 4 years ago
- TinkerPop 3 implementation on Elasticsearch backend☆70Updated 9 years ago
- An RDF plugin for Solr☆114Updated 3 months ago
- Dice Solr Plugins from Simon Hughes Dice.com☆87Updated 4 years ago
- Examples for using the dedupe library☆412Updated 9 months ago
- Neo4j-based recommendation engine module with real-time and pre-computed recommendations.☆376Updated 4 years ago
- An open source, high scalability toolkit in Java for Entity Resolution.☆218Updated last year
- A text tagger based on Lucene / Solr, using FST technology☆176Updated last year
- ☆61Updated 7 months ago
- Chalk is a natural language processing library.☆258Updated 8 years ago
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆446Updated last year
- Scalable query engine for web scrapping/data mashup/acceptance QA, powered by Apache Spark☆142Updated 2 weeks ago
- String metrics and phonetic algorithms for Scala (e.g. Dice/Sorensen, Hamming, Jaccard, Jaro, Jaro-Winkler, Levenshtein, Metaphone, N-Gr…☆485Updated 7 years ago
- **Archived** Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured predi…☆471Updated 5 years ago
- GraphAware Neo4j Framework☆244Updated 4 years ago