Record matching and entity resolution at scale in Spark
☆36Oct 31, 2023Updated 2 years ago
Alternatives and similar repositories for spark-matcher
Users that are interested in spark-matcher are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for performing Blocking using Deep Learning based on the paper "Deep Learning for Blocking in Entity Matching: A Design Space …☆30Apr 5, 2023Updated 3 years ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆37Dec 3, 2023Updated 2 years ago
- MinHash implementation in Python☆12Aug 24, 2024Updated last year
- Minoan ER is an Entity Resolution (ER) framework, built by researchers in Crete (the land of the ancient Minoan civilization). Entity res…☆18Nov 18, 2020Updated 5 years ago
- Asynchronous actions for PySpark☆47Dec 2, 2021Updated 4 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Spark Monitoring☆14Feb 28, 2023Updated 3 years ago
- LEMON: Explainable Entity Matching☆19Apr 6, 2022Updated 4 years ago
- Scrapes job data from Glassdoor. Fast and free Glassdoor Scraper to extract all data from job listings including salaries, companies, and…☆17Dec 20, 2023Updated 2 years ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆511Jan 9, 2026Updated 5 months ago
- ☆13Aug 10, 2023Updated 2 years ago
- ☆18Apr 27, 2026Updated last month
- A list of free data matching and record linkage software.☆406Feb 21, 2024Updated 2 years ago
- An open source, high scalability toolkit in Java for Entity Resolution.☆224Jul 12, 2025Updated 11 months ago
- Python package for deduplication/entity resolution using active learning☆82Aug 24, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Bluetooth Indoor Positioning with DNNs☆13Mar 28, 2022Updated 4 years ago
- Continuous Benchmark of Filtering methods for Entity Resolution☆11Jul 20, 2025Updated 10 months ago
- UI for JedAI Toolkit☆17May 20, 2022Updated 4 years ago
- Similarity and distance measures for clustering and record linkage applications in R☆19Sep 23, 2025Updated 8 months ago
- ☆18Nov 9, 2025Updated 7 months ago
- SparkER: an Entity Resolution framework for Apache Spark☆66Mar 29, 2024Updated 2 years ago
- Template for building a Singer Target☆20Sep 3, 2024Updated last year
- Efficient String Comparison Functions and Fuzzy String Matching☆20Sep 21, 2025Updated 8 months ago
- A Generalized Data Cleaning System☆52Apr 28, 2016Updated 10 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Bachelor's Thesis on Adversarial Machine Learning Attacks and Defences☆17Nov 18, 2022Updated 3 years ago
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)☆12Oct 11, 2024Updated last year
- JedAI-WebApp is a GUI that facilitates the execution of JedAI. JedAI is an open source, high scalability toolkit that offers out-of-the-b…☆26Apr 14, 2023Updated 3 years ago
- Strabon is a fully implemented semantic geospatial database system that can be used to store linked geospatial data expressed in RDF and …☆26Jun 28, 2024Updated last year
- ☆13Feb 10, 2023Updated 3 years ago
- Tutorial code and data for the entity resolution workshops.☆45Jul 15, 2015Updated 10 years ago
- mrhyde-tools gem - static site quick starter script wizard .:. jekyll command line tool☆14Aug 2, 2022Updated 3 years ago
- A swarm of LLM agents that will help you test, document, and productionize your code!☆19Jun 8, 2026Updated last week
- Python implementation of Histogrammar, a package for creating histograms with Numpy, Pandas and Spark.☆36Sep 2, 2025Updated 9 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- My presentation at ODSC India 2018 about Deep Learning with Apache Spark☆27Sep 1, 2018Updated 7 years ago
- Samples of authenticating to an Azure Key Vault vault☆13May 10, 2022Updated 4 years ago
- LIDA: Lightweight Interactive Dialogue Annotator (in EMNLP 2019)☆10Oct 18, 2021Updated 4 years ago
- A Python package with explanation methods for extraction of feature interactions from predictive models☆34Nov 18, 2023Updated 2 years ago
- Python wrapper for a C++ Double Metaphone☆15Jan 12, 2026Updated 5 months ago
- ☆10Oct 19, 2020Updated 5 years ago
- Bashrs: Rust-to-Shell Transpiler for Deterministic Bootstrap Scripts☆36May 4, 2026Updated last month