ing-bank/spark-matcher

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ing-bank/spark-matcher)

ing-bank / spark-matcher

Record matching and entity resolution at scale in Spark

☆36

Alternatives and similar repositories for spark-matcher

Users that are interested in spark-matcher are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OlivierBinette / er-evaluation
View on GitHub
An End-to-End Evaluation Framework for Entity Resolution Systems
☆38Dec 3, 2023Updated 2 years ago
qcri / DeepBlocker
View on GitHub
Repository for performing Blocking using Deep Learning based on the paper "Deep Learning for Blocking in Entity Matching: A Design Space …
☆30Apr 5, 2023Updated 3 years ago
fritshermans / deduplipy
View on GitHub
Python package for deduplication/entity resolution using active learning
☆82Aug 24, 2024Updated last year
BBVA / mercury-settrie
View on GitHub
A Python 3 library developed in C++ that enables efficient storage and querying of sets of sets. It can be used to perform fast document …
☆13Jun 18, 2026Updated last month
stephanecollot / sparkmon
View on GitHub
Spark Monitoring
☆14Feb 28, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ing-bank / probatus
View on GitHub
SHAP-based validation for linear and tree-based models. Applied to binary, multiclass and regression problems.
☆154Apr 19, 2025Updated last year
NilsBarlaug / lemon
View on GitHub
LEMON: Explainable Entity Matching
☆19Apr 6, 2022Updated 4 years ago
falkenbach / actor-glassdoor-jobs
View on GitHub
Scrapes job data from Glassdoor. Fast and free Glassdoor Scraper to extract all data from job listings including salaries, companies, and…
☆17Dec 20, 2023Updated 2 years ago
ing-bank / popmon
View on GitHub
Monitor the stability of a Pandas or Spark dataframe ⚙︎
☆512Jan 9, 2026Updated 6 months ago
dssg / pgdedupe
View on GitHub
A simple command line interface to the datamade/dedupe library.
☆43Dec 26, 2022Updated 3 years ago
google-marketing-solutions / web-performance-lab
View on GitHub
☆19Jun 23, 2026Updated last month
JoonyoungYi / LLORMA-tensorflow
View on GitHub
The tensorflow prototype of "Local Low-rank Matrix Approximation" (LLORMA)
☆10Jan 11, 2019Updated 7 years ago
scravy / pysparkextra
View on GitHub
☆10Jun 29, 2021Updated 5 years ago
scify / JedAIToolkit
View on GitHub
An open source, high scalability toolkit in Java for Entity Resolution.
☆226Jul 12, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
LIBRA-AI-Tech / sensors-positioning
View on GitHub
Bluetooth Indoor Positioning with DNNs
☆13Mar 28, 2022Updated 4 years ago
gpapadis / ContinuousFilteringBenchmark
View on GitHub
Continuous Benchmark of Filtering methods for Entity Resolution
☆11Jul 20, 2025Updated last year
scify / jedai-ui
View on GitHub
UI for JedAI Toolkit
☆17May 20, 2022Updated 4 years ago
trevorprater / serf
View on GitHub
Stanford Entity-Resolution Framework
☆24Jun 23, 2018Updated 8 years ago
mohanakrishnavh / pyspark-tutorial
View on GitHub
☆19Nov 9, 2025Updated 8 months ago
ngmarchant / comparator
View on GitHub
Similarity and distance measures for clustering and record linkage applications in R
☆19Sep 23, 2025Updated 10 months ago
baziotis / Talks
View on GitHub
Any content related to any talks.
☆12Dec 7, 2020Updated 5 years ago
Gaglia88 / sparker
View on GitHub
SparkER: an Entity Resolution framework for Apache Spark
☆67Mar 29, 2024Updated 2 years ago
readthedocs-examples / example-jupyter-book
View on GitHub
An example Jupyter Book project integrated with Read the Docs
☆20Jan 12, 2026Updated 6 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
goodwillpunning / hyperleaup
View on GitHub
Create and manipulate Tableau Hyper files from Apache Spark DataFrames and Spark SQL
☆30Jan 8, 2026Updated 6 months ago
mayer79 / statistical_computing_material
View on GitHub
Material for the lecture Statistical Computing
☆12Jan 1, 2026Updated 6 months ago
OlivierBinette / StringCompare
View on GitHub
Efficient String Comparison Functions and Fuzzy String Matching
☆21Sep 21, 2025Updated 10 months ago
yaronshap / GloballyConsistentRules
View on GitHub
Implementation of algorithms from the paper "Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application…
☆24Jun 4, 2022Updated 4 years ago
vitillo / spark-hyperloglog
View on GitHub
Algebird's HyperLogLog support for Apache Spark.
☆10Jul 20, 2017Updated 9 years ago
AristiPap / Adversarial_ML_Research
View on GitHub
Bachelor's Thesis on Adversarial Machine Learning Attacks and Defences
☆17Nov 18, 2022Updated 3 years ago
swairshah / Intensify
View on GitHub
coloring terminal text with intensities (used for plotting probability, entropy with tokens)
☆12Oct 11, 2024Updated last year
ConstantB / ontop-spatial
View on GitHub
Ontop Framework
☆21Jun 9, 2018Updated 8 years ago
getml / getml-demo
View on GitHub
Showcase notebooks for getML
☆19Jan 20, 2026Updated 6 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
pyartemis / artemis
View on GitHub
A Python package with explanation methods for extraction of feature interactions from predictive models
☆34Nov 18, 2023Updated 2 years ago
fugue-project / triad
View on GitHub
A collection of python utility functions
☆11May 8, 2026Updated 2 months ago
DistrictDataLabs / entity-resolution
View on GitHub
Tutorial code and data for the entity resolution workshops.
☆45Jul 15, 2015Updated 11 years ago
kyegomez / dev-swarm
View on GitHub
A swarm of LLM agents that will help you test, document, and productionize your code!
☆20Updated this week
shauli-ravfogel / conformal-prediction
View on GitHub
☆10Feb 2, 2023Updated 3 years ago
histogrammar / histogrammar-python
View on GitHub
Python implementation of Histogrammar, a package for creating histograms with Numpy, Pandas and Spark.
☆36Sep 2, 2025Updated 10 months ago
stefan-jansen / topic-modeling
View on GitHub
☆14Feb 10, 2023Updated 3 years ago