ing-bank / spark-matcherLinks
Record matching and entity resolution at scale in Spark
☆34Updated last year
Alternatives and similar repositories for spark-matcher
Users that are interested in spark-matcher are comparing it to the libraries listed below
Sorting:
- Python package for deduplication/entity resolution using active learning☆80Updated 9 months ago
- real-time data + ML pipeline☆54Updated this week
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆28Updated last year
- ☆15Updated 2 years ago
- An abstraction layer for parameter tuning☆35Updated 9 months ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- Repository for my master thesis on automated string handling☆16Updated 3 years ago
- 📈🔍 Lets Python do AB testing analysis.☆77Updated last month
- Automatically transform all categorical, date-time, NLP variables to numeric in a single line of code for any data set any size.☆65Updated 4 months ago
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- Projects developed by Domino's R&D team☆76Updated 3 years ago
- This project focuses on DeepER, a deep learning framework for entity resolution (record deduplication). It examines how DeepER performs o…☆47Updated 7 years ago
- Pipeline components that support partial_fit.☆46Updated 10 months ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Similarity encoding of dirty categorical variables (strings)☆20Updated 6 years ago
- MinHash implementation in Python☆11Updated 9 months ago
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆54Updated 8 months ago
- Official Repository for EvalRS @ KDD 2023: a Rounded Evaluation of Recommender Systems☆30Updated last year
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- 🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects☆81Updated 3 years ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated 2 years ago
- Interactive notebooks containing demonstration code of the splink library☆38Updated last year
- Predict the poverty of households in Costa Rica using automated feature engineering.☆23Updated 4 years ago
- Best practices for engineering ML pipelines.☆35Updated 2 years ago
- ☆32Updated 3 years ago
- A PaaS End-to-End ML Setup with Metaflow, Serverless and SageMaker.☆37Updated 4 years ago
- mercury-graph is a Python library that offers graph analytics capabilities with a technology-agnostic API.☆30Updated 2 months ago