data61 / blocklib
Python implementations of record linkage blocking techniques.
☆19Updated last year
Alternatives and similar repositories for blocklib:
Users that are interested in blocklib are comparing it to the libraries listed below
- CLK hash: hash pii for entity matching☆47Updated last year
- Python implementation of anonymous linkage using cryptographic linkage keys☆65Updated 8 months ago
- Privacy Preserving Record Linkage Service☆26Updated last year
- Python wrapper for a C++ Double Metaphone☆15Updated 2 years ago
- ☆13Updated 5 years ago
- A maximum-strength name parser for record linkage.☆36Updated 5 months ago
- Collaboration app for sharing and reviewing jupyter notebooks☆16Updated last year
- ☆15Updated 2 years ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆29Updated 2 years ago
- Demo of an In-database processing tool for scikit-learn☆13Updated 2 years ago
- Advanced similarity and duplicate source code at scale.☆55Updated 5 years ago
- Plugin for Intake to read from SQL servers☆15Updated last year
- Datasette plugin for authenticating access using API tokens☆11Updated 4 months ago
- Collaborative NLP annotation tool supporting enterprise authentication, inter-annotator statistics, active learning☆13Updated last year
- A selection of business datasets☆17Updated 5 years ago
- utilities for filesystem exploration and automated builds☆21Updated 3 weeks ago
- A tool to read CSV files with CSVW metadata and transform them into other formats.☆32Updated 5 years ago
- Render Jupyter Notebooks With Metaflow Cards☆25Updated 4 months ago
- A financial disclosure data extraction tool.☆13Updated last year
- This repository contains code to build an MVP search engine with google like interface.☆15Updated 4 years ago
- Resources for tackling record linkage / deduplication / data matching problems☆116Updated 11 months ago
- Copy Pandas DataFrames and HDF5 files to PostgreSQL database☆53Updated 2 weeks ago
- Uses your app logs to visualize how the data moves between the code, database, HTTP services, message queue, external storages etc.☆23Updated 9 months ago
- Record matching and entity resolution at scale in Spark☆32Updated last year
- This project is created to promote and advocate the use of FOSS machine learning.☆44Updated 4 months ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆55Updated last month
- This is a crawler for crawling papers from google scholar (http://scholar.google.com). Credits for this code goes to (https://github.com/…☆11Updated 7 years ago
- Generate database schema, documentation, and other artifacts from an Entity-Relationship diagram, which is created as a GraphML file usin…☆17Updated 4 years ago
- A Scalable Data Cleaning Library for PySpark.☆26Updated 5 years ago
- Python bindings for Neo4j☆26Updated 10 years ago