moj-analytical-services / splink_demos
Interactive notebooks containing demonstration code of the splink library
☆38Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for splink_demos
- A browser user interface for manual labeling of record pairs.☆41Updated last year
- Record matching and entity resolution at scale in Spark☆31Updated last year
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- A tutorial on entity resolution (record linkage or de-duplication)☆61Updated 4 years ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- PySpark phonetic and string matching algorithms☆35Updated 8 months ago
- ☄️ Parallel and distributed training with spaCy and Ray☆54Updated last year
- A python package to create a database on the platform using our moj data warehousing framework☆21Updated 2 months ago
- pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other do…☆10Updated last year
- Fast, flexible name matching for large datasets☆70Updated 10 months ago
- A maximum-strength name parser for record linkage.☆32Updated 3 months ago
- Entity Matching Model solves the problem of matching company names between two possibly very large datasets.☆53Updated last month
- Notebooks configured to be run with Binder, usually found on my blog.☆41Updated last year
- Data Science Festival Workshop 7 November 2020 – Building a fashion recommender using Tensorflow/Keras with ASOS.☆23Updated 4 years ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆25Updated 11 months ago
- List of entity resolution software and resources.☆35Updated 8 months ago
- Build your feature store with macros right within your dbt repository☆37Updated last year
- An abstraction layer for parameter tuning☆36Updated 2 months ago
- Prototype search engine for ONS bulletins☆23Updated 6 months ago
- ☆44Updated 8 months ago
- Tutorial for implementing data validation in data science pipelines☆32Updated 2 years ago
- Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb☆18Updated last year
- Python package for deduplication/entity resolution using active learning☆79Updated 2 months ago
- Dataframe Integration with spaCy.☆101Updated 3 years ago
- Record linking package that fuzzy matches two Python pandas dataframes using sqlite3 fts4☆281Updated 2 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- a convenient way to anonymize your data for analytics☆20Updated 3 years ago
- Helper code to interact with Rasgo via our SDK, PyRasgo☆40Updated last year
- Course materials for our "Getting Started with NLP and spaCy" course at Talk Python☆35Updated 5 months ago
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆34Updated 4 years ago