Record matching and entity resolution at scale in Spark
☆36Oct 31, 2023Updated 2 years ago
Alternatives and similar repositories for spark-matcher
Users that are interested in spark-matcher are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆15Aug 11, 2022Updated 3 years ago
- Entity resolution using zero labeled examples☆33Jun 29, 2024Updated last year
- Minoan ER is an Entity Resolution (ER) framework, built by researchers in Crete (the land of the ancient Minoan civilization). Entity res…☆18Nov 18, 2020Updated 5 years ago
- Asynchronous actions for PySpark☆47Dec 2, 2021Updated 4 years ago
- Spark Monitoring☆14Feb 28, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- SHAP-based validation for linear and tree-based models. Applied to binary, multiclass and regression problems.☆153Apr 19, 2025Updated last year
- LEMON: Explainable Entity Matching☆19Apr 6, 2022Updated 4 years ago
- Scrapes job data from Glassdoor. Fast and free Glassdoor Scraper to extract all data from job listings including salaries, companies, and…☆17Dec 20, 2023Updated 2 years ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆512Jan 9, 2026Updated 4 months ago
- ☆11Apr 2, 2021Updated 5 years ago
- ☆10Jun 29, 2021Updated 4 years ago
- This repository stores all scripts to analyze MEG data from the eponymous manuscript.☆27Dec 7, 2016Updated 9 years ago
- Ordeq simplifies IO and modularizes pipeline logic.☆41Dec 19, 2025Updated 5 months ago
- Soufflé Datalog Language Server. Add smart features to the Soufflé Datalog Language with the help of LSP in a VS code plugin☆14Sep 30, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Continuous Benchmark of Filtering methods for Entity Resolution☆11Jul 20, 2025Updated 10 months ago
- Similarity and distance measures for clustering and record linkage applications in R☆19Sep 23, 2025Updated 8 months ago
- Any content related to any talks.☆12Dec 7, 2020Updated 5 years ago
- ☆18Nov 9, 2025Updated 6 months ago
- SparkER: an Entity Resolution framework for Apache Spark☆66Mar 29, 2024Updated 2 years ago
- Efficient String Comparison Functions and Fuzzy String Matching☆20Sep 21, 2025Updated 8 months ago
- A Generalized Data Cleaning System☆51Apr 28, 2016Updated 10 years ago
- Implementation of algorithms from the paper "Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application…☆24Jun 4, 2022Updated 3 years ago
- Bachelor's Thesis on Adversarial Machine Learning Attacks and Defences☆17Nov 18, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Docker Monitoring and Management Client☆26Feb 12, 2015Updated 11 years ago
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)☆12Oct 11, 2024Updated last year
- Ontop Framework☆21Jun 9, 2018Updated 7 years ago
- Create and manipulate Tableau Hyper files from Apache Spark DataFrames and Spark SQL☆31Jan 8, 2026Updated 4 months ago
- An RMarkdown thesis template for the University of Amsterdam☆29Feb 19, 2024Updated 2 years ago
- ☆13Feb 10, 2023Updated 3 years ago
- Tutorial code and data for the entity resolution workshops.☆45Jul 15, 2015Updated 10 years ago
- ☆10Feb 2, 2023Updated 3 years ago
- An implementation of a neural network training routine using derivative information in Pytorch.☆11Dec 19, 2020Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A swarm of LLM agents that will help you test, document, and productionize your code!☆19Updated this week
- Samples of authenticating to an Azure Key Vault vault☆13May 10, 2022Updated 4 years ago
- Implementation of TANE for experimental purposes☆15Apr 29, 2022Updated 4 years ago
- A Python package with explanation methods for extraction of feature interactions from predictive models☆33Nov 18, 2023Updated 2 years ago
- Python wrapper for a C++ Double Metaphone☆15Jan 12, 2026Updated 4 months ago
- Algebird's HyperLogLog support for Apache Spark.☆10Jul 20, 2017Updated 8 years ago
- Material for the lecture Statistical Computing☆12Jan 1, 2026Updated 4 months ago