data61 / clkhash
CLK hash: hash pii for entity matching
☆47Updated last year
Related projects ⓘ
Alternatives and complementary repositories for clkhash
- Python implementation of anonymous linkage using cryptographic linkage keys☆63Updated 5 months ago
- Privacy Preserving Record Linkage Service☆26Updated last year
- Python implementations of record linkage blocking techniques.☆19Updated last year
- A simple command line interface to the datamade/dedupe library.☆42Updated last year
- A maximum-strength name parser for record linkage.☆32Updated 3 months ago
- Python wrapper for a C++ Double Metaphone☆15Updated last year
- Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).☆14Updated 5 years ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- PMML evaluator library for the PostgreSQL database (http://www.postgresql.org/)☆11Updated 9 years ago
- variations of the record linkage model of Steorts et al. AISTATS 2014's "SMERED: A Bayesian Approach to Graphical Record Linkage and De-d…☆27Updated 7 years ago
- Resources for tackling record linkage / deduplication / data matching problems☆112Updated 8 months ago
- A browser user interface for manual labeling of record pairs.☆41Updated last year
- Quickly compare changes made to Jupyter notebooks in GitHub repositories with jupydiff!☆13Updated last year
- Scalable String Similarity Joins in Python☆39Updated 4 months ago
- data wrangling simplicity, complete audit transparency, and at speed☆34Updated 2 months ago
- ☆43Updated 2 years ago
- Dedupe/batch geocode addresses and venues around the world with libpostal☆82Updated 2 years ago
- ☆13Updated 5 years ago
- Data Scientist code test☆19Updated 4 years ago
- Perform Bayesian record linkage with a one-to-one matching assumption.☆11Updated 4 years ago
- Toolkit of queries for examining a PostgreSQL database, in executable IPython Notebook format.☆18Updated 9 years ago
- Framework for processing data packages in pipelines of modular components.☆119Updated last year
- National Data Archive (NADA) is an open source data cataloging system that serves as a portal for researchers to browse, search, compare,…☆38Updated this week
- Copy Pandas DataFrames and HDF5 files to PostgreSQL database☆54Updated 2 months ago
- Exploring sequential data with a sankey diagram☆22Updated last year
- Set-oriented Operations in Pandas☆24Updated 4 years ago
- @vega transforms with @ibis-project expressions☆29Updated 3 years ago
- Split a JSON file with hierarchical data to multiple CSV files☆28Updated last year
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆60Updated this week
- Model drift detection☆11Updated last year