ing-bank / spark-matcherLinks
Record matching and entity resolution at scale in Spark
☆34Updated last year
Alternatives and similar repositories for spark-matcher
Users that are interested in spark-matcher are comparing it to the libraries listed below
Sorting:
- Python package for deduplication/entity resolution using active learning☆80Updated 9 months ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆29Updated last year
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- Automatically transform all categorical, date-time, NLP variables to numeric in a single line of code for any data set any size.☆65Updated 4 months ago
- An abstraction layer for parameter tuning☆35Updated 9 months ago
- Validation for forecasts☆18Updated 2 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Best practices for engineering ML pipelines.☆35Updated 3 years ago
- real-time data + ML pipeline☆54Updated last week
- Demo of a supervised machine learning approach for Entity Resolution in graph using Neo4j GDS Link Prediction Pipelines☆22Updated 3 years ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 4 years ago
- Using a feature store to connect the DataOps and MLOps workflows to enable collaborative teams to develop efficiently.☆56Updated 2 years ago
- A Scalable Data Cleaning Library for PySpark.☆29Updated 6 years ago
- 🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects☆81Updated 3 years ago
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆54Updated 9 months ago
- Pipeline components that support partial_fit.☆46Updated 11 months ago
- Abstractions for feature engineering on large graphs of tabular data.☆21Updated 3 weeks ago
- NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for …☆106Updated 3 years ago
- Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the same…☆29Updated 2 years ago
- Spark implementation of computing Shapley Values using monte-carlo approximation☆74Updated 2 years ago
- 📈🔍 Lets Python do AB testing analysis.☆77Updated 2 months ago
- Example usage of scikit-hts☆57Updated 2 years ago
- ☄️ Parallel and distributed training with spaCy and Ray☆54Updated last year
- Predict the poverty of households in Costa Rica using automated feature engineering.☆23Updated 4 years ago
- Feast AWS guide using Redshift / Spectrum / DynamoDB to build a credit scoring model☆64Updated 3 years ago
- Repository for my master thesis on automated string handling☆16Updated 3 years ago