mitdbg / lazo
Sketch and LSH Index library for Java, including OPH methods as well as the Lazo method
☆13Updated last year
Alternatives and similar repositories for lazo:
Users that are interested in lazo are comparing it to the libraries listed below
- Graph Engine for Exploration and Search☆40Updated last year
- FlexMatcher is a schema matching package in Python which handles the problem of matching multiple schemas to a single mediated schema.☆29Updated 2 months ago
- A Jupyter notebook extension to centralize and manage data☆14Updated 2 years ago
- Code and Benchmarks for JOSIE (SIGMOD 2019)☆18Updated last year
- deep entity resolution lite version☆11Updated 5 years ago
- Scalable String Similarity Joins in Python☆38Updated 7 months ago
- Algorithms for "schema matching"☆26Updated 8 years ago
- A proposed standard `NOCK` for a Parquet format that supports efficient distributed serialization of multiple kinds of graph technologies☆19Updated 2 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index☆42Updated last year
- Benchmark Datasets for Set Similarity Search☆12Updated 6 years ago
- ☆13Updated 5 years ago
- SparkER: an Entity Resolution framework for Apache Spark☆63Updated 10 months ago
- ☆75Updated last year
- An End-to-End Evaluation Framework for Entity Resolution Systems☆26Updated last year
- Explaining Inference Queries with Bayesian Optimization☆10Updated 4 years ago
- Project overview and links to various resources☆18Updated 3 years ago
- MinHash implementation in Python☆11Updated 5 months ago
- ☆16Updated 9 years ago
- Extracting Entities with Limited Evidence☆16Updated 2 years ago
- Scripts for ECML PKDD 2018 article: Similarity encoding for learning with dirty categorical variables☆11Updated 6 years ago
- A new framework to generate interpretable classification rules☆17Updated 2 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆26Updated 2 months ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- This project focuses on DeepER, a deep learning framework for entity resolution (record deduplication). It examines how DeepER performs o…☆46Updated 6 years ago
- python3 package supporting efficient storage and querying of sets of sets using the trie data structure. Supports finding all the superse…☆23Updated last year
- Scripts for paper "Encoding high-cardinality string categorical variables"☆24Updated 5 years ago
- End-to-End Deep Entity Resolution☆30Updated 3 years ago
- A fast high dimensional near neighbor search algorithm based on group testing and locality sensitive hashing☆19Updated last year
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated 11 months ago