moj-analytical-services / splink_demos
Interactive notebooks containing demonstration code of the splink library
☆38Updated last year
Alternatives and similar repositories for splink_demos:
Users that are interested in splink_demos are comparing it to the libraries listed below
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- A python package to create a database on the platform using our moj data warehousing framework☆21Updated 7 months ago
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- A browser user interface for manual labeling of record pairs.☆46Updated last year
- Record matching and entity resolution at scale in Spark☆34Updated last year
- ☄️ Parallel and distributed training with spaCy and Ray☆53Updated last year
- Fully unit tested utility functions for data engineering. Python 3 only.☆16Updated 7 months ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆27Updated last year
- Simplifies use of the Dedupe library via Pandas☆135Updated 2 years ago
- Prototype search engine for ONS bulletins☆24Updated 11 months ago
- Tool for probabilistically linking the records of individual entities (e.g. people) within and across datasets☆111Updated 4 months ago
- Data Scientist code test☆19Updated 4 years ago
- List of entity resolution software and resources.☆63Updated last month
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- A tutorial on entity resolution (record linkage or de-duplication)☆62Updated 4 years ago
- Python package implementing transformers for pre processing steps for machine learning.☆57Updated this week
- Automatically export Jupyter notebooks to various file formats (.py, .html, and more) on save.☆77Updated last year
- A containerized demo of Airflow using gusty☆39Updated 9 months ago
- Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb☆20Updated last year
- A hands-on tutorial showing how to use Python to do anonymisation with synthetic data☆79Updated 2 years ago
- A maximum-strength name parser for record linkage.☆36Updated last week
- ☆38Updated 2 months ago
- Data-aware orchestration with dagster, dbt, and airbyte☆31Updated 2 years ago
- ☆28Updated 6 years ago
- Performs unique entity estimation corresponding to Chen, Shrivastava, Steorts (2018).☆14Updated 6 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- Python wrapper for a C++ Double Metaphone☆15Updated 2 years ago
- A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable fro…☆27Updated 2 years ago
- Fast, flexible name matching for large datasets☆71Updated last year
- The easiest way to integrate Kedro and Great Expectations☆53Updated 2 years ago