Record matching and entity resolution at scale in Spark
☆36Oct 31, 2023Updated 2 years ago
Alternatives and similar repositories for spark-matcher
Users that are interested in spark-matcher are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for performing Blocking using Deep Learning based on the paper "Deep Learning for Blocking in Entity Matching: A Design Space …☆31Apr 5, 2023Updated 2 years ago
- ☆15Aug 11, 2022Updated 3 years ago
- Spark Monitoring☆13Feb 28, 2023Updated 3 years ago
- Scrapes job data from Glassdoor. Fast and free Glassdoor Scraper to extract all data from job listings including salaries, companies, and…☆17Dec 20, 2023Updated 2 years ago
- Monitor the stability of a Pandas or Spark dataframe ⚙︎☆511Jan 9, 2026Updated 2 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Optimizing Databricks Workload, published by Packt☆18Mar 2, 2026Updated 3 weeks ago
- ☆10Jun 29, 2021Updated 4 years ago
- An open source, high scalability toolkit in Java for Entity Resolution.☆222Jul 12, 2025Updated 8 months ago
- Ordeq simplifies IO and modularizes pipeline logic.☆41Dec 19, 2025Updated 3 months ago
- Python package for deduplication/entity resolution using active learning☆83Aug 24, 2024Updated last year
- UI for JedAI Toolkit☆17May 20, 2022Updated 3 years ago
- ☆18Nov 9, 2025Updated 4 months ago
- SparkER: an Entity Resolution framework for Apache Spark☆65Mar 29, 2024Updated 2 years ago
- Implementation of algorithms from the paper "Globally-Consistent Rule-Based Summary-Explanations for Machine Learning Models: Application…☆24Jun 4, 2022Updated 3 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Format Python TOML configurations.☆68Updated this week
- Docker Monitoring and Management Client☆26Feb 12, 2015Updated 11 years ago
- Showcase notebooks for getML☆19Jan 20, 2026Updated 2 months ago
- Create and manipulate Tableau Hyper files from Apache Spark DataFrames and Spark SQL☆31Jan 8, 2026Updated 2 months ago
- ☆13Feb 10, 2023Updated 3 years ago
- ☆10Feb 2, 2023Updated 3 years ago
- Tutorial code and data for the entity resolution workshops.☆45Jul 15, 2015Updated 10 years ago
- Python implementation of Histogrammar, a package for creating histograms with Numpy, Pandas and Spark.☆36Sep 2, 2025Updated 6 months ago
- LIDA: Lightweight Interactive Dialogue Annotator (in EMNLP 2019)☆10Oct 18, 2021Updated 4 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A Python package with explanation methods for extraction of feature interactions from predictive models☆33Nov 18, 2023Updated 2 years ago
- Collect and aggregate on spark events for profitz☆10Apr 22, 2022Updated 3 years ago
- Python wrapper for a C++ Double Metaphone☆15Jan 12, 2026Updated 2 months ago
- ☆16Jul 23, 2023Updated 2 years ago
- Pymodeltime offers a unified framework tailored to address a broad spectrum of requirements, including time series forecasting and variou…☆14Feb 5, 2024Updated 2 years ago
- Entity Matching Model solves the problem of matching company names between two possibly very large datasets.☆92Mar 11, 2026Updated 2 weeks ago
- A swarm of LLM agents that will help you test, document, and productionize your code!☆16Mar 22, 2026Updated last week
- ☆10Oct 19, 2020Updated 5 years ago
- Generator for graph of transactions. Is kind of optimized for large graph generations. Contains graph structure generation, nodes informa…☆30Aug 13, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Experimental version of jxbz/agd implementing support for bias terms, affine parameters, transformers, etc.☆12Jul 30, 2023Updated 2 years ago
- ☆10Oct 12, 2021Updated 4 years ago
- Data engineering pipeline for the household COVID-19 Infection Survey (CIS)☆10Jul 18, 2023Updated 2 years ago
- provide preprocessing platform for Lucene indexing and comprehensive Learning-to-Rank modules☆13Feb 16, 2018Updated 8 years ago
- Subset Met Office MOGREPS-UK and UKV on AWS EC2☆12Oct 22, 2021Updated 4 years ago
- R package for weighted model metrics☆11Apr 12, 2025Updated 11 months ago
- Capgemini UK Software Engineering Grade Ladder☆12Apr 12, 2023Updated 2 years ago