data61 / blocklibLinks
Python implementations of record linkage blocking techniques.
☆21Updated 2 years ago
Alternatives and similar repositories for blocklib
Users that are interested in blocklib are comparing it to the libraries listed below
Sorting:
- CLK hash: hash pii for entity matching☆48Updated 8 months ago
- Python implementation of anonymous linkage using cryptographic linkage keys☆70Updated last year
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆66Updated last week
- A maximum-strength name parser for record linkage.☆39Updated 5 months ago
- A browser user interface for manual labeling of record pairs.☆48Updated 2 years ago
- Scalable String Similarity Joins in Python☆39Updated last year
- Record matching and entity resolution at scale in Spark☆36Updated 2 years ago
- PyPi module for Graphlet AI Knowledge Graph Factory☆33Updated 2 years ago
- Set-oriented Operations in Pandas☆24Updated 5 years ago
- Now included in rigour☆152Updated 2 months ago
- Copy Pandas DataFrames and HDF5 files to PostgreSQL database☆55Updated 2 months ago
- ☆48Updated last year
- data wrangling simplicity, complete audit transparency, and at speed☆35Updated 4 months ago
- Language detection using Spacy and Fasttext☆57Updated 2 years ago
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. I…☆25Updated 3 years ago
- Framework for processing data packages in pipelines of modular components.☆123Updated 7 months ago
- API for OpenSanctions with support for entity search and bulk matching of data collections. Supports Reconciliation API spec.☆121Updated this week
- MLOps simplified. One-stop AI delivery platform, all the features you need.☆106Updated last week
- Generating Realistic Synthetic Data☆41Updated last year
- Record Linkage ToolKit (Find and link entities)☆111Updated 2 years ago
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆46Updated last month
- An automation tool to refactor Jupyter Notebooks to Python modules, with code dependency analysis.☆12Updated 11 months ago
- real-time data + ML pipeline☆53Updated this week
- KnowledgeRepo + JupyterLab☆48Updated 2 weeks ago
- "1 + 1 = 1 or Record Deduplication with Python" Jupyter Notebook☆84Updated 3 years ago
- Hierarchical Clustering Algorithms☆36Updated 3 years ago
- 🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)☆141Updated 2 years ago
- Advanced similarity and duplicate source code at scale.☆56Updated 6 years ago
- Reference Graph Gists☆45Updated 5 years ago
- Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully custom…☆45Updated last month