ing-bank / spark-matcher
Record matching and entity resolution at scale in Spark
☆31Updated last year
Related projects ⓘ
Alternatives and complementary repositories for spark-matcher
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Python package for deduplication/entity resolution using active learning☆78Updated 2 months ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 2 years ago
- PySpark phonetic and string matching algorithms☆35Updated 9 months ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆26Updated 11 months ago
- An abstraction layer for parameter tuning☆36Updated 2 months ago
- ☆15Updated 2 years ago
- This repo is an approach to TDD in machine learning model operation. it covers project structure, testing essentials using pytest with Gi…☆14Updated 3 years ago
- Demo on how to use Prefect with Docker☆26Updated 2 years ago
- Function for automatically detecting Simpson's Paradox☆18Updated 3 years ago
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- Best practices for engineering ML pipelines.☆37Updated 2 years ago
- Interactive notebooks containing demonstration code of the splink library☆38Updated 10 months ago
- A Scalable Data Cleaning Library for PySpark.☆26Updated 5 years ago
- Template for data pipelines, ML workflows, API dev and monitoring☆45Updated 10 months ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- This project focuses on DeepER, a deep learning framework for entity resolution (record deduplication). It examines how DeepER performs o…☆45Updated 6 years ago
- Deploy A/B testing infrastructure in a containerized microservice architecture for Machine Learning applications.☆39Updated last year
- Abstractions for feature engineering on large graphs of tabular data.☆22Updated last week
- Automatically transform all categorical, date-time, NLP variables to numeric in a single line of code for any data set any size.☆64Updated 9 months ago
- ☆16Updated 3 years ago
- Implementation of the paper "Deep Indexed Active Learning for Matching Heterogeneous Entity Representations"☆16Updated 2 years ago
- Repository for my master thesis on automated string handling☆16Updated 3 years ago
- Exploring some issues related to churn☆17Updated 8 months ago
- Tutorial code and data for the entity resolution workshops.☆45Updated 9 years ago
- Demo of Streamlit application with Databricks SQL Endpoint☆33Updated 2 years ago
- ☆32Updated 3 years ago
- MinHash implementation in Python☆11Updated 2 months ago
- Productivity Utilities for Data Science with Python Notebooks☆5Updated 4 years ago