larsga / Duke
Duke is a fast and flexible deduplication engine written in Java
☆619Updated last year
Alternatives and similar repositories for Duke:
Users that are interested in Duke are comparing it to the libraries listed below
- Elasticsearch entity resolution plugin based on Duke☆210Updated 4 years ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆382Updated 2 years ago
- A java library for stored queries☆375Updated 2 years ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆128Updated 9 years ago
- Data Integration Graph☆206Updated 6 years ago
- Spark RDD with Lucene's query and entity linkage capabilities☆125Updated this week
- ☆92Updated 9 years ago
- MADlib has moved to Apache MADlib (incubating). Please send pull requests to the Apache repository.☆507Updated 7 years ago
- Elasticsearch Index Termlist☆117Updated 5 years ago
- ☆184Updated 6 years ago
- Neo4j-based recommendation engine module with real-time and pre-computed recommendations.☆376Updated 3 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆281Updated 6 years ago
- Simplifying robust end-to-end machine learning on Apache Spark.☆470Updated 7 years ago
- Generates more or less realistic log data for testing simple aggregation queries.☆257Updated last year
- GraphAware Framework Module for Integrating Neo4j with Elasticsearch☆261Updated 3 years ago
- BlinkDB: Sub-Second Approximate Queries on Very Large Data.☆660Updated 11 years ago
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆446Updated last year
- Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statisti…☆1,084Updated last year
- KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apach…☆1,182Updated 8 years ago
- An open source, high scalability toolkit in Java for Entity Resolution.☆218Updated 11 months ago
- TinkerPop 3 implementation on Elasticsearch backend