david-siqi-liu / sparklyclean
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
☆10Updated 4 years ago
Alternatives and similar repositories for sparklyclean:
Users that are interested in sparklyclean are comparing it to the libraries listed below
- SparkER: an Entity Resolution framework for Apache Spark☆63Updated 9 months ago
- A Java framework to build semantics-aware autoencoder neural network from a knowledge-graph.☆13Updated 7 years ago
- UI for JedAI Toolkit☆16Updated 2 years ago
- End-to-End Deep Entity Resolution☆31Updated 3 years ago
- ☆15Updated 2 years ago
- An example of Spark and GraphX with Twitter as sample☆19Updated 8 years ago
- A library to store metadata of relational databases including the schema, statistics, and integrity constraints.☆25Updated 6 years ago
- Condor allows for the specification of synopsis-based streaming jobs on top of general dataflow systems. Condor provides a collection of …☆13Updated 6 months ago
- A Generalized Data Cleaning System☆49Updated 8 years ago
- LSHDB is a parallel and distributed data engine, which relies on Locality-Sensitive Hashing and noSQL systems, for performing record link…☆29Updated 2 years ago
- deep entity resolution lite version☆11Updated 5 years ago
- Benchmark Datasets for Set Similarity Search☆12Updated 5 years ago
- Vector search in Lucene based search attempting to use just the existing Lucene data structures (experimental)☆43Updated 5 years ago
- WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing,…☆110Updated 2 years ago
- ☆75Updated last year
- Stanford Entity-Resolution Framework☆23Updated 6 years ago
- ☆32Updated 3 years ago
- Library of graph algorithms for Apache Giraph.☆8Updated 9 years ago
- This project provides procedures and functions to support machine learning applications with Neo4j.☆37Updated 6 years ago
- Apache NiFi NLP Processor☆18Updated last year
- Collection of some algorithms for entity resolution☆28Updated 9 years ago
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 4 years ago
- Explaining Inference Queries with Bayesian Optimization☆10Updated 3 years ago
- Project overview and links to various resources☆18Updated 3 years ago
- JedAI-WebApp is a GUI that facilitates the execution of JedAI. JedAI is an open source, high scalability toolkit that offers out-of-the-b…☆23Updated last year
- Real-time query spark and visualise it as graph.☆24Updated 7 years ago
- Fork of the Freely Extensible Biomedical Record Linkage program☆24Updated 8 years ago
- S2RDF (SPARQL on Spark for RDF) is a SPARQL query processor for Hadoop based on Spark SQL. It uses the relational interface of Spark for …☆13Updated 6 years ago
- Temporal_Graph_library☆25Updated 5 years ago
- A single docker image that combines Neo4j Mazerunner and Apache Spark GraphX into a powerful all-in-one graph processing engine☆46Updated 5 years ago