ing-bank / spark-matcher
Record matching and entity resolution at scale in Spark
☆34Updated last year
Alternatives and similar repositories for spark-matcher
Users that are interested in spark-matcher are comparing it to the libraries listed below
Sorting:
- An abstraction layer for parameter tuning☆35Updated 8 months ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- An End-to-End Evaluation Framework for Entity Resolution Systems☆28Updated last year
- Python package for deduplication/entity resolution using active learning☆79Updated 8 months ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated 5 months ago
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- real-time data + ML pipeline☆54Updated last month
- Automatically transform all categorical, date-time, NLP variables to numeric in a single line of code for any data set any size.☆65Updated 3 months ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated 2 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- ☆15Updated 2 years ago
- Projects developed by Domino's R&D team☆76Updated 3 years ago
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- Pandas helper functions☆30Updated 2 years ago
- OptimalFlow is an omni-ensemble and scalable automated machine learning Python toolkit, which uses Pipeline Cluster Traversal Experiments…☆27Updated last year
- ☄️ Parallel and distributed training with spaCy and Ray☆54Updated last year
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated 2 years ago
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆53Updated 8 months ago
- A unified wrapper for various ML frameworks - to have one uniform scikit-learn format for predict and predict_proba functions.☆48Updated 4 months ago
- Extra functionalities for river☆14Updated 11 months ago
- this repo might get accepted☆28Updated 4 years ago
- ☆16Updated 2 years ago
- SparkER: an Entity Resolution framework for Apache Spark☆64Updated last year
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- A Scalable Data Cleaning Library for PySpark.☆27Updated 6 years ago
- Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the same…☆28Updated 2 years ago
- Guide for applying Unit Testing in data-driven projects☆19Updated 5 years ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year