Record matching and entity resolution at scale in Spark
☆36Oct 31, 2023Updated 2 years ago
Alternatives and similar repositories for spark-matcher
Users that are interested in spark-matcher are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- mercury-monitoring is a library to monitor data and model drift☆16Mar 19, 2026Updated last month
- Minoan ER is an Entity Resolution (ER) framework, built by researchers in Crete (the land of the ancient Minoan civilization). Entity res…☆17Nov 18, 2020Updated 5 years ago
- A Python 3 library developed in C++ that enables efficient storage and querying of sets of sets. It can be used to perform fast document …☆13Feb 6, 2026Updated 2 months ago
- Spark Monitoring☆13Feb 28, 2023Updated 3 years ago
- Reels is a library for analyzing sequences of events from transactional data to predict when related target events may occur in the futur…☆15Feb 17, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- SHAP-based validation for linear and tree-based models. Applied to binary, multiclass and regression problems.☆152Apr 19, 2025Updated last year
- mercury-explainability is a library with implementations of different state-of-the-art methods in the field of explainability. They are d…☆17Mar 26, 2025Updated last year
- LEMON: Explainable Entity Matching☆19Apr 6, 2022Updated 4 years ago
- Scrapes job data from Glassdoor. Fast and free Glassdoor Scraper to extract all data from job listings including salaries, companies, and…☆17Dec 20, 2023Updated 2 years ago
- Fuzzy matching function in spark (https://spark-packages.org/package/itspawanbhardwaj/spark-fuzzy-matching)☆24Dec 30, 2019Updated 6 years ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆511Jan 9, 2026Updated 3 months ago
- A simple command line interface to the datamade/dedupe library.☆43Dec 26, 2022Updated 3 years ago
- ☆17Feb 15, 2023Updated 3 years ago
- ☆11Apr 2, 2021Updated 5 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A list of free data matching and record linkage software.☆403Feb 21, 2024Updated 2 years ago
- Stanford Entity-Resolution Framework☆24Jun 23, 2018Updated 7 years ago
- Continuous Benchmark of Filtering methods for Entity Resolution☆11Jul 20, 2025Updated 8 months ago
- UI for JedAI Toolkit☆17May 20, 2022Updated 3 years ago
- Similarity and distance measures for clustering and record linkage applications in R☆18Sep 23, 2025Updated 6 months ago
- Any content related to any talks.☆12Dec 7, 2020Updated 5 years ago
- mercury-graph is a Python library that offers graph analytics capabilities with a technology-agnostic API.☆38Mar 21, 2025Updated last year
- Implementation of algorithms from the paper "Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application…☆24Jun 4, 2022Updated 3 years ago
- LSH/Hypercube kNN and KMeans++ Clustering on polygonic curves and time series☆15Feb 7, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Bachelor's Thesis on Adversarial Machine Learning Attacks and Defences☆17Nov 18, 2022Updated 3 years ago
- A Python wrapper over the GraphGen system☆38Sep 15, 2017Updated 8 years ago
- Docker Monitoring and Management Client☆26Feb 12, 2015Updated 11 years ago
- Create and manipulate Tableau Hyper files from Apache Spark DataFrames and Spark SQL☆31Jan 8, 2026Updated 3 months ago
- A collection of python utility functions☆11Mar 30, 2026Updated 2 weeks ago
- Strabon is a fully implemented semantic geospatial database system that can be used to store linked geospatial data expressed in RDF and …☆24Jun 28, 2024Updated last year
- Tutorial code and data for the entity resolution workshops.☆45Jul 15, 2015Updated 10 years ago
- Python implementation of Histogrammar, a package for creating histograms with Numpy, Pandas and Spark.☆36Sep 2, 2025Updated 7 months ago
- Samples of authenticating to an Azure Key Vault vault☆13May 10, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Collect and aggregate on spark events for profitz☆10Apr 22, 2022Updated 3 years ago
- This project implements different Deep Autoencoder for Collaborative Filtering for Recommendation Systems in Keras☆53Nov 28, 2019Updated 6 years ago
- Material for the lecture Statistical Computing☆11Jan 1, 2026Updated 3 months ago
- Entity Matching Model solves the problem of matching company names between two possibly very large datasets.☆92Mar 11, 2026Updated last month
- Pymodeltime offers a unified framework tailored to address a broad spectrum of requirements, including time series forecasting and variou…☆14Feb 5, 2024Updated 2 years ago
- ☆10Oct 19, 2020Updated 5 years ago
- ☆12Sep 4, 2017Updated 8 years ago