ing-bank / spark-matcherView external linksLinks
Record matching and entity resolution at scale in Spark
☆36Oct 31, 2023Updated 2 years ago
Alternatives and similar repositories for spark-matcher
Users that are interested in spark-matcher are comparing it to the libraries listed below
Sorting:
- An End-to-End Evaluation Framework for Entity Resolution Systems☆36Dec 3, 2023Updated 2 years ago
- ☆15Aug 11, 2022Updated 3 years ago
- Scrapes job data from Glassdoor. Fast and free Glassdoor Scraper to extract all data from job listings including salaries, companies, and…☆17Dec 20, 2023Updated 2 years ago
- LEMON: Explainable Entity Matching☆19Apr 6, 2022Updated 3 years ago
- ☆18Nov 9, 2025Updated 3 months ago
- Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.☆27Oct 30, 2021Updated 4 years ago
- Minoan ER is an Entity Resolution (ER) framework, built by researchers in Crete (the land of the ancient Minoan civilization). Entity res…☆17Nov 18, 2020Updated 5 years ago
- Stanford Entity-Resolution Framework☆24Jun 23, 2018Updated 7 years ago
- Implementation of algorithms from the paper "Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application…☆25Jun 4, 2022Updated 3 years ago
- A universal messaging library for cross-platform applications (Chrome extension, Web, Mobile, Iframe,...)☆15Oct 10, 2025Updated 4 months ago
- SparkER: an Entity Resolution framework for Apache Spark☆65Mar 29, 2024Updated last year
- A Python package with explanation methods for extraction of feature interactions from predictive models☆33Nov 18, 2023Updated 2 years ago
- ☆10Jun 29, 2021Updated 4 years ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆510Jan 9, 2026Updated last month
- A list of free data matching and record linkage software.☆401Feb 21, 2024Updated last year
- An open source, high scalability toolkit in Java for Entity Resolution.☆222Jul 12, 2025Updated 7 months ago
- Python library for the simulation of probabilistic circuits.☆11Feb 1, 2026Updated 2 weeks ago
- Automated Continuous Data Quality Measurement☆12Nov 15, 2023Updated 2 years ago
- Deduplicates property owners in Massachusetts using the MassGIS standardized assessors' parcel dataset and the OpenCorporates Bulk Data p…☆13Jan 26, 2026Updated 3 weeks ago
- Workflow Automation with Microsoft Power Automate, 2nd Edition, Published by Packt☆17Jan 18, 2023Updated 3 years ago
- SHAP-based validation for linear and tree-based models. Applied to binary, multiclass and regression problems.☆153Apr 19, 2025Updated 9 months ago
- Framework for studying cryptographic hash functions using SAT.☆10Dec 21, 2021Updated 4 years ago
- OpenTelemetry layer for HTTP/gRPC services☆10Feb 4, 2026Updated last week
- Python interface to the FDIC's API for publically available bank data☆12Apr 15, 2023Updated 2 years ago
- ☆14Nov 27, 2025Updated 2 months ago
- Search Volume for amazon completeion service☆13Feb 5, 2019Updated 7 years ago
- R dashboard as a designer☆10Oct 29, 2015Updated 10 years ago
- CSC 424 Advanced Database Management Systems☆16Jan 1, 2020Updated 6 years ago
- Interactive notebooks containing demonstration code of the splink library☆40Jan 19, 2024Updated 2 years ago
- Friday Forecasting Talks materials☆11May 24, 2024Updated last year
- ☆10Oct 19, 2020Updated 5 years ago
- A distributed execution framework built upon lunatic.☆16Jan 19, 2024Updated 2 years ago
- ChatGPT solutions for the MLE interview☆14Dec 9, 2022Updated 3 years ago
- ☆13Jul 25, 2024Updated last year
- ☆18Jan 14, 2020Updated 6 years ago
- Data Governance app for Splunk☆12Oct 19, 2023Updated 2 years ago
- Wikipedia "people" Images Dataset Downloader☆11Dec 3, 2023Updated 2 years ago
- Very basic solar regression☆16Updated this week
- OTP generation & validation library for Rust☆14Dec 4, 2025Updated 2 months ago