ing-bank / spark-matcher
Record matching and entity resolution at scale in Spark
☆34Updated last year
Alternatives and similar repositories for spark-matcher:
Users that are interested in spark-matcher are comparing it to the libraries listed below
- An End-to-End Evaluation Framework for Entity Resolution Systems☆26Updated last year
- real-time data + ML pipeline☆54Updated last month
- ☆15Updated 2 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- An abstraction layer for parameter tuning☆35Updated 6 months ago
- Python package for deduplication/entity resolution using active learning☆76Updated 6 months ago
- Repo contains Jupyter notebooks compiled during my review of the programming books listed.☆13Updated 2 years ago
- A PaaS End-to-End ML Setup with Metaflow, Serverless and SageMaker.☆37Updated 4 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆26Updated 2 months ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- Demo of a supervised machine learning approach for Entity Resolution in graph using Neo4j GDS Link Prediction Pipelines☆22Updated 2 years ago
- Interactive notebooks containing demonstration code of the splink library☆37Updated last year
- JedAI-WebApp is a GUI that facilitates the execution of JedAI. JedAI is an open source, high scalability toolkit that offers out-of-the-b…☆23Updated last year
- Repository for my master thesis on automated string handling☆16Updated 3 years ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 3 years ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated 11 months ago
- Exploring some issues related to churn☆17Updated 11 months ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- An implementation of a full two-step recommendation pipeline applied on the Kaggle H&M data☆22Updated 2 years ago
- Pipeline components that support partial_fit.☆45Updated 7 months ago
- ☆32Updated 3 years ago
- 🚕 Self-contained demo using Redpanda, Materialize, River, Redis, and Streamlit to predict taxi trip durations☆47Updated last year
- Similarity encoding of dirty categorical variables (strings)☆20Updated 6 years ago
- ForML - A development framework and MLOps platform for the lifecycle management of data science projects☆104Updated last year
- Official Repository for EvalRS @ KDD 2023: a Rounded Evaluation of Recommender Systems☆30Updated last year
- Abstractions for feature engineering on large graphs of tabular data.☆21Updated last month
- Projects developed by Domino's R&D team☆76Updated 2 years ago
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆53Updated 5 months ago