Record matching and entity resolution at scale in Spark
☆36Oct 31, 2023Updated 2 years ago
Alternatives and similar repositories for spark-matcher
Users that are interested in spark-matcher are comparing it to the libraries listed below
Sorting:
- Repository for performing Blocking using Deep Learning based on the paper "Deep Learning for Blocking in Entity Matching: A Design Space …☆32Apr 5, 2023Updated 2 years ago
- ☆15Aug 11, 2022Updated 3 years ago
- Scrapes job data from Glassdoor. Fast and free Glassdoor Scraper to extract all data from job listings including salaries, companies, and…☆17Dec 20, 2023Updated 2 years ago
- LEMON: Explainable Entity Matching☆19Apr 6, 2022Updated 3 years ago
- ☆18Nov 9, 2025Updated 4 months ago
- Asynchronous actions for PySpark☆48Dec 2, 2021Updated 4 years ago
- Minoan ER is an Entity Resolution (ER) framework, built by researchers in Crete (the land of the ancient Minoan civilization). Entity res…☆17Nov 18, 2020Updated 5 years ago
- Fuzzy matching function in spark (https://spark-packages.org/package/itspawanbhardwaj/spark-fuzzy-matching)☆24Dec 30, 2019Updated 6 years ago
- Implementation of algorithms from the paper "Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application…☆25Jun 4, 2022Updated 3 years ago
- SparkER: an Entity Resolution framework for Apache Spark☆65Mar 29, 2024Updated last year
- Create and manipulate Tableau Hyper files from Apache Spark DataFrames and Spark SQL☆31Jan 8, 2026Updated 2 months ago
- A Python package with explanation methods for extraction of feature interactions from predictive models☆33Nov 18, 2023Updated 2 years ago
- Material for the lecture Statistical Computing☆11Jan 1, 2026Updated 2 months ago
- ☆10Jun 29, 2021Updated 4 years ago
- A universal messaging library for cross-platform applications (Chrome extension, Web, Mobile, Iframe,...)☆15Oct 10, 2025Updated 4 months ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆512Jan 9, 2026Updated 2 months ago
- A list of free data matching and record linkage software.☆401Feb 21, 2024Updated 2 years ago
- Automated Continuous Data Quality Measurement☆12Nov 15, 2023Updated 2 years ago
- Simple python script that converts all Excel files (xls, xlsx, xlsm, csv) in a directory into xlsb files.☆10Mar 13, 2023Updated 2 years ago
- Framework for studying cryptographic hash functions using SAT.☆10Dec 21, 2021Updated 4 years ago
- ☆12Oct 18, 2022Updated 3 years ago
- This repo contains all codes of the articles that I have published on Medium☆10Feb 10, 2021Updated 5 years ago
- Fundamental Accounting Concept Relations validation for International Financial Reporting Standards (IFRS).☆14Sep 20, 2018Updated 7 years ago
- Python library for the simulation of probabilistic circuits.☆11Feb 1, 2026Updated last month
- Workflow Automation with Microsoft Power Automate, 2nd Edition, Published by Packt☆17Jan 18, 2023Updated 3 years ago
- Deduplicates property owners in Massachusetts using the MassGIS standardized assessors' parcel dataset and the OpenCorporates Bulk Data p…☆13Jan 26, 2026Updated last month
- A maximum-strength name parser for record linkage.☆39Sep 3, 2025Updated 6 months ago
- Data validation library for PySpark 3.0.0☆33Nov 11, 2022Updated 3 years ago
- Python interface to the FDIC's API for publically available bank data☆12Apr 15, 2023Updated 2 years ago
- Search Volume for amazon completeion service☆13Feb 5, 2019Updated 7 years ago
- An Elder Scrolls neural name generator trained using PyTorch☆10Jan 29, 2019Updated 7 years ago
- Collect and aggregate on spark events for profitz☆10Apr 22, 2022Updated 3 years ago
- R dashboard as a designer☆10Oct 29, 2015Updated 10 years ago
- A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning …☆45May 6, 2022Updated 3 years ago
- An implementation of a neural network training routine using derivative information in Pytorch.☆10Dec 19, 2020Updated 5 years ago
- ☆12Mar 1, 2024Updated 2 years ago
- This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…☆104Sep 26, 2025Updated 5 months ago
- Tutorial repo for the article "ML in Production"☆12Sep 8, 2018Updated 7 years ago
- CSC 424 Advanced Database Management Systems☆16Jan 1, 2020Updated 6 years ago