An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.
☆14Feb 8, 2026Updated 3 weeks ago
Alternatives and similar repositories for blocking
Users that are interested in blocking are comparing it to the libraries listed below
Sorting:
- Blocking records for record linkage and data deduplication based on ANN algorithms in Python.☆19Nov 28, 2025Updated 3 months ago
- Clustering and Link Prediction Evaluation in R☆14Sep 23, 2023Updated 2 years ago
- Lightweight validation tool for checking function arguments and data analysis scripts.☆12Dec 24, 2024Updated last year
- Perform Bayesian record linkage with a one-to-one matching assumption.☆11Jul 9, 2020Updated 5 years ago
- An R package for modern methods for non-probability samples☆54Nov 4, 2025Updated 3 months ago
- Similarity and distance measures for clustering and record linkage applications in R☆18Sep 23, 2025Updated 5 months ago
- pseudopeople is a Python package that generates realistic simulated data about a fictional United States population, designed for use in …☆24Feb 20, 2026Updated last week
- SAE Unit/area Models and Methods for Estimation in R☆26Aug 29, 2025Updated 6 months ago
- Distributed Bayesian Entity Resolution in Apache Spark☆59Jun 10, 2021Updated 4 years ago
- An End-to-End Evaluation Framework for Entity Resolution Systems☆36Dec 3, 2023Updated 2 years ago
- mixgb: multiple imputation through XGBoost☆26Jan 20, 2026Updated last month
- ☆10Nov 6, 2025Updated 3 months ago
- A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and ne…☆43Feb 23, 2023Updated 3 years ago
- A package for Bilateral and Multilateral Price Index Calculations☆11Feb 18, 2026Updated last week
- Wstęp do programowania używając R☆10Mar 14, 2024Updated last year
- Build SVG Custom User Interface in R, rmd, qmd and Shiny☆20Apr 10, 2025Updated 10 months ago
- This repository contains CROW, the Clerical Resolution Online Widget, an open-source project designed to help data linkers with their cle…☆10Feb 20, 2026Updated last week
- Eurostat Big Data Hackathon 2021☆15Jan 15, 2026Updated last month
- ☆11Dec 12, 2025Updated 2 months ago
- Automatize downloading of meteorological/hydrological dataset from IMGW-PIB☆12Aug 11, 2020Updated 5 years ago
- Spatial Seemingly Unrelated Regressions☆11Apr 22, 2022Updated 3 years ago
- Dynamically Generate Quarto Syntax☆25Jul 7, 2025Updated 7 months ago
- List of entity resolution software and resources.☆109Feb 22, 2025Updated last year
- Record Linkage Toolkit for R☆47Jan 8, 2026Updated last month
- Supporting material of talks given at useR!2024☆13Jul 12, 2024Updated last year
- Survey statistics in a database☆12Aug 27, 2024Updated last year
- Spatial Optimization for R☆36Jan 27, 2026Updated last month
- An efficient method for sampling from the Gram--Schmidt Walk Design.☆13Sep 16, 2023Updated 2 years ago
- Benchmark scripts for comparing tutorials in PyTorch and JAX☆14Aug 25, 2022Updated 3 years ago
- ☆22Nov 7, 2025Updated 3 months ago
- A Quarto extension to create multiple choice questions (quizzes) in HTML documents☆13Dec 26, 2023Updated 2 years ago
- Ecological mixed-effects ordination with lme4☆12May 9, 2016Updated 9 years ago
- Embed Pyodide-powered, entirely serverless Gradio apps into Quarto documents.☆16Mar 3, 2025Updated 11 months ago
- Generates Markdown documentation from Python module dosctrings☆15Aug 28, 2025Updated 6 months ago
- Typed, annotated vectors for well-documented datasets☆11Jan 30, 2026Updated last month
- 🏆 SUCCESS-GS: Survey of Compactness and Compression for Efficient Static and Dynamic Gaussian Splatting☆20Feb 4, 2026Updated 3 weeks ago
- A review of the most popular topic modeling techniques, featuring hands-on tutorials.☆12Apr 29, 2025Updated 10 months ago
- CUDA code with exact k-NN algorithm for multiple GPU system.☆12Jul 5, 2024Updated last year
- Python 3 API Client for Polish REGON database (Baza Internetowa Regon - BIR)☆17Oct 20, 2025Updated 4 months ago