mitdbg / lazo
Sketch and LSH Index library for Java, including OPH methods as well as the Lazo method
☆13Updated last year
Alternatives and similar repositories for lazo:
Users that are interested in lazo are comparing it to the libraries listed below
- A Jupyter notebook extension to centralize and manage data☆14Updated 2 years ago
- FlexMatcher is a schema matching package in Python which handles the problem of matching multiple schemas to a single mediated schema.☆29Updated 3 months ago
- Graph Engine for Exploration and Search☆40Updated last year
- Algorithms for "schema matching"☆26Updated 8 years ago
- ☆76Updated 2 years ago
- ☆11Updated last year
- A proposed standard `NOCK` for a Parquet format that supports efficient distributed serialization of multiple kinds of graph technologies☆19Updated 2 years ago
- Code and Benchmarks for JOSIE (SIGMOD 2019)☆18Updated last year
- Project overview and links to various resources☆18Updated 3 years ago
- Explaining Inference Queries with Bayesian Optimization☆10Updated 4 years ago
- Pattern-based table discovery in Open Data CSV files☆25Updated 2 years ago
- deep entity resolution lite version☆11Updated 5 years ago
- Scalable String Similarity Joins in Python☆38Updated 8 months ago
- quadipy is a python package to help transform structured data into RDF graph format☆19Updated last year
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- Welcome to Snowman App – a Data Matching Benchmark Platform.☆37Updated 2 years ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- Code repository for Mondrian, a project for multiregion template recognition in spreadsheets.☆14Updated 2 years ago
- Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index☆42Updated last year
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆26Updated 3 months ago
- S2RDF (SPARQL on Spark for RDF) is a SPARQL query processor for Hadoop based on Spark SQL. It uses the relational interface of Spark for …☆13Updated 6 years ago
- ☆15Updated 2 years ago
- Extracting Entities with Limited Evidence☆16Updated 2 years ago
- Benchmarking the Chase☆9Updated 7 years ago
- Yet another easy-to-use python3 parallel library for humans.☆13Updated 4 years ago
- A Generalized Data Cleaning System☆49Updated 8 years ago
- ☆20Updated last year
- Efficient set similarity search algorithms implemented in Go☆30Updated 2 years ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- A repository for the "Combining DBpedia and Topic Modeling" GSoC 2016 idea☆13Updated 8 years ago