david-siqi-liu / sparklycleanLinks
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
☆10Updated 4 years ago
Alternatives and similar repositories for sparklyclean
Users that are interested in sparklyclean are comparing it to the libraries listed below
Sorting:
- SparkER: an Entity Resolution framework for Apache Spark☆65Updated last year
- ☆15Updated 2 years ago
- UI for JedAI Toolkit☆17Updated 3 years ago
- JedAI-WebApp is a GUI that facilitates the execution of JedAI. JedAI is an open source, high scalability toolkit that offers out-of-the-b…☆23Updated 2 years ago
- deep entity resolution lite version☆11Updated 5 years ago
- An open source, high scalability toolkit in Java for Entity Resolution.☆218Updated last year
- ☆77Updated 2 years ago
- Collection of some algorithms for entity resolution☆28Updated 9 years ago
- Rheem - a cross-platform data processing system☆5Updated 3 years ago
- ☆190Updated last year
- End-to-End Deep Entity Resolution☆31Updated 3 years ago
- Source code for several Metanome data profiling algorithms☆54Updated 2 years ago
- LSHDB is a parallel and distributed data engine, which relies on Locality-Sensitive Hashing and noSQL systems, for performing record link…☆31Updated 2 years ago
- Record Linkage ToolKit (Find and link entities)☆110Updated last year
- An example of Spark and GraphX with Twitter as sample☆19Updated 8 years ago
- A Scalable Data Cleaning Library for PySpark.☆27Updated 6 years ago
- A library to store metadata of relational databases including the schema, statistics, and integrity constraints.☆25Updated 6 years ago
- A tool facilitating matching for any dataset discovery method. Also, an extensible experiment suite for state-of-the-art schema matching …☆88Updated last week
- WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing,…☆110Updated 3 years ago
- Explaining Inference Queries with Bayesian Optimization☆10Updated 4 years ago
- Use faker cypher functions to generate demo and test data with cypher☆34Updated 2 years ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- Condor allows for the specification of synopsis-based streaming jobs on top of general dataflow systems. Condor provides a collection of …☆13Updated 11 months ago
- A Generalized Data Cleaning System☆50Updated 9 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- Minoan ER is an Entity Resolution (ER) framework, built by researchers in Crete (the land of the ancient Minoan civilization). Entity res…☆17Updated 4 years ago
- Apache NiFi NLP Processor☆18Updated last year
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 4 years ago
- Loads LDBC social graph data into Flink DataSets☆10Updated 8 months ago
- Implementation of TANE for experimental purposes☆13Updated 3 years ago