moj-analytical-services / splink_demos
Interactive notebooks containing demonstration code of the splink library
☆37Updated last year
Alternatives and similar repositories for splink_demos:
Users that are interested in splink_demos are comparing it to the libraries listed below
- Record matching and entity resolution at scale in Spark☆34Updated last year
- A python package to create a database on the platform using our moj data warehousing framework☆21Updated 6 months ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆26Updated last year
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- Fast, flexible name matching for large datasets☆70Updated last year
- A maximum-strength name parser for record linkage.☆36Updated last month
- Data Scientist code test☆19Updated 4 years ago
- ☄️ Parallel and distributed training with spaCy and Ray☆53Updated last year
- A browser user interface for manual labeling of record pairs.☆45Updated last year
- A tutorial on entity resolution (record linkage or de-duplication)☆63Updated 4 years ago
- Prototype search engine for ONS bulletins☆23Updated 10 months ago
- Tool for probabilistically linking the records of individual entities (e.g. people) within and across datasets☆109Updated 3 months ago
- Buy Till You Die and Customer Lifetime Value statistical models in Python.☆116Updated 9 months ago
- Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).☆14Updated 6 years ago
- pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other do…☆10Updated last year
- Fully unit tested utility functions for data engineering. Python 3 only.☆15Updated 6 months ago
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆34Updated 4 years ago
- Abstractions for feature engineering on large graphs of tabular data.☆21Updated last month
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Notebooks for the ML Link Prediction Course☆14Updated 4 years ago
- Predict whether a student will correctly answer a problem based on past performance using automated feature engineering☆32Updated 4 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- 🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects☆81Updated 3 years ago
- Causal Inference Using Quasi-Experimental Methods☆20Updated 4 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆36Updated 7 months ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- variations of the record linkage model of Steorts et al. AISTATS 2014's "SMERED: A Bayesian Approach to Graphical Record Linkage and De-d…☆27Updated 7 years ago
- Resources for tackling record linkage / deduplication / data matching problems☆120Updated last year