data61 / blocklibLinks
Python implementations of record linkage blocking techniques.
☆21Updated last year
Alternatives and similar repositories for blocklib
Users that are interested in blocklib are comparing it to the libraries listed below
Sorting:
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆60Updated last week
- CLK hash: hash pii for entity matching☆47Updated 2 months ago
- Python implementation of anonymous linkage using cryptographic linkage keys☆65Updated last year
- Record matching and entity resolution at scale in Spark☆35Updated last year
- A maximum-strength name parser for record linkage.☆38Updated last month
- A browser user interface for manual labeling of record pairs.☆47Updated 2 years ago
- Python wrapper for a C++ Double Metaphone☆15Updated last week
- Language detection using Spacy and Fasttext☆57Updated last year
- ☆48Updated last year
- PyPi module for Graphlet AI Knowledge Graph Factory☆29Updated 2 years ago
- Resources for tackling record linkage / deduplication / data matching problems☆125Updated last year
- Set-oriented Operations in Pandas☆24Updated 5 years ago
- Copy Pandas DataFrames and HDF5 files to PostgreSQL database☆55Updated 6 months ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Now included in rigour☆151Updated last week
- Comparing Polars to Pandas and a small introduction☆44Updated 4 years ago
- Loading OpenSanctions into Neo4J and Linkurious☆30Updated 7 months ago
- A tool to read CSV files with CSVW metadata and transform them into other formats.☆33Updated 6 years ago
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. I…☆23Updated 3 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆46Updated 3 years ago
- data wrangling simplicity, complete audit transparency, and at speed☆34Updated 3 weeks ago
- Interactive notebooks containing demonstration code of the splink library☆38Updated last year
- PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolut…☆154Updated 2 years ago
- Framework for processing data packages in pipelines of modular components.☆121Updated last month
- MLOps simplified. One-stop AI delivery platform, all the features you need.☆100Updated last week
- A Scalable Data Cleaning Library for PySpark.☆29Updated 6 years ago
- Spark NLP for Streamlit☆15Updated 3 years ago
- Talk "Beyond pandas: The great Python dataframe showdown"☆37Updated 2 years ago
- High-performance data retrieval from Neo4j with Apache Arrow 🏹☆31Updated 3 years ago