moj-analytical-services / splink_demosLinks
Interactive notebooks containing demonstration code of the splink library
☆38Updated last year
Alternatives and similar repositories for splink_demos
Users that are interested in splink_demos are comparing it to the libraries listed below
Sorting:
- A browser user interface for manual labeling of record pairs.☆47Updated last year
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- A python package to create a database on the platform using our moj data warehousing framework☆21Updated 8 months ago
- Record matching and entity resolution at scale in Spark☆34Updated last year
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- Entity Matching Model solves the problem of matching company names between two possibly very large datasets.☆72Updated 3 months ago
- Fast, flexible name matching for large datasets☆72Updated last week
- A tutorial on entity resolution (record linkage or de-duplication)☆63Updated 4 years ago
- Data Scientist code test☆19Updated 4 years ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆28Updated last year
- Blog post on ETL pipelines with Airflow☆23Updated 4 years ago
- A maximum-strength name parser for record linkage.☆37Updated 3 weeks ago
- Simplifies use of the Dedupe library via Pandas☆136Updated 2 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other do…☆10Updated last year
- Python package for deduplication/entity resolution using active learning☆80Updated 9 months ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- ☆11Updated 3 years ago
- ☆28Updated 6 years ago
- ☄️ Parallel and distributed training with spaCy and Ray☆54Updated last year
- Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb☆20Updated last year
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- 🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects☆81Updated 3 years ago
- Buy Till You Die and Customer Lifetime Value statistical models in Python.☆117Updated last year
- Python wrapper for a C++ Double Metaphone☆15Updated 3 weeks ago
- Predict whether a student will correctly answer a problem based on past performance using automated feature engineering☆32Updated 4 years ago
- Demo of Streamlit application with Databricks SQL Endpoint☆35Updated 2 years ago
- Fully unit tested utility functions for data engineering. Python 3 only.☆17Updated 9 months ago
- Notebooks configured to be run with Binder, usually found on my blog.☆42Updated 2 years ago
- Ibis analytics, with Ibis (and more!)☆22Updated 8 months ago