larsga / Duke
Duke is a fast and flexible deduplication engine written in Java
☆615Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Duke
- Elasticsearch entity resolution plugin based on Duke☆210Updated 4 years ago
- A java library for stored queries☆374Updated last year
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆382Updated last year
- ☆184Updated 6 years ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆128Updated 8 years ago
- Banana for Solr - A Port of Kibana☆668Updated 3 months ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆281Updated 6 years ago
- REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models☆580Updated 2 months ago
- Text classification using Naive Bayes and Elasticsearch☆154Updated 8 years ago
- Generates more or less realistic log data for testing simple aggregation queries.☆257Updated 11 months ago
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆445Updated 11 months ago
- Graphify is a Neo4j unmanaged extension used for document and text classification using graph-based hierarchical pattern recognition.☆382Updated 4 years ago
- ☆92Updated 9 years ago
- Data Integration Graph☆205Updated 6 years ago
- Elasticsearch Index Termlist☆117Updated 5 years ago
- Entity resolution for Elasticsearch.☆157Updated 3 months ago
- A platform for real-time streaming search☆103Updated 8 years ago
- StreamFlow™ is a stream processing tool designed to help build and monitor processing workflows.☆253Updated 11 months ago
- (deprecated) High performance Elasticsearch percolator☆46Updated 5 years ago
- An interactive data exploration UI for Druid☆646Updated 8 years ago
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆269Updated 2 years ago
- Simplifying robust end-to-end machine learning on Apache Spark.☆469Updated 7 years ago
- Solr query parser plugin that performs proper query-time synonym expansion.☆150Updated 3 years ago
- Silk is a port of Kibana 4 project.☆69Updated 8 years ago
- Fabric-based framework for deploying and managing SolrCloud clusters in the cloud.☆90Updated 5 years ago
- KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apach…☆1,182Updated 7 years ago
- An open-source, vendor-neutral data context service.☆159Updated 6 years ago
- TinkerPop 3 implementation on Elasticsearch backend☆70Updated 9 years ago