larsga / Duke
Duke is a fast and flexible deduplication engine written in Java
☆619Updated last year
Alternatives and similar repositories for Duke:
Users that are interested in Duke are comparing it to the libraries listed below
- Elasticsearch entity resolution plugin based on Duke☆210Updated 4 years ago
- A java library for stored queries☆375Updated last year
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆383Updated 2 years ago
- Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.☆128Updated 9 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆281Updated 6 years ago
- Entity resolution for Elasticsearch.☆158Updated last month
- ☆92Updated 9 years ago
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆446Updated last year
- A bunch of fancy soft string matching routines, with some accompanying datasets☆56Updated 7 years ago
- Graphify is a Neo4j unmanaged extension used for document and text classification using graph-based hierarchical pattern recognition.☆380Updated 4 years ago
- MADlib has moved to Apache MADlib (incubating). Please send pull requests to the Apache repository.☆507Updated 7 years ago
- Elasticsearch Index Termlist☆117Updated 5 years ago
- TinkerPop 3 implementation on Elasticsearch backend☆70Updated 9 years ago
- Data Integration Graph☆207Updated 6 years ago
- Similarity or Distance Metrics, e.g. Levenshtein, for Java☆344Updated 3 years ago
- Collection of some algorithms for entity resolution☆28Updated 9 years ago
- Simplifying robust end-to-end machine learning on Apache Spark.☆470Updated 7 years ago
- TinkerPop3 Graph Structure Implementation for OrientDB☆93Updated this week
- Query preprocessor for Java-based search engines (Querqy Core and Solr implementation)☆183Updated this week
- An Elasticsearch ingest processor to do named entity extraction using Apache OpenNLP☆270Updated 2 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- An open source, high scalability toolkit in Java for Entity Resolution.☆216Updated 10 months ago
- A toolkit for making domain-specific probabilistic parsers☆799Updated 4 months ago
- Delimited file loader for Cassandra☆198Updated 5 years ago
- A text tagger based on Lucene / Solr, using FST technology☆176Updated last year
- Neo4j-based recommendation engine module with real-time and pre-computed recommendations.☆376Updated 3 years ago
- Text classification using Naive Bayes and Elasticsearch☆154Updated 8 years ago
- Dice Solr Plugins from Simon Hughes Dice.com☆87Updated 3 years ago
- High-security graph database☆62Updated 2 years ago
- Mirror of Apache Stanbol (incubating)☆112Updated 11 months ago