dell-research-harvard / linktransformerView external linksLinks
A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
☆134Nov 5, 2025Updated 3 months ago
Alternatives and similar repositories for linktransformer
Users that are interested in linktransformer are comparing it to the libraries listed below
Sorting:
- The SQL/Ibis powered sklearn of record linkage☆23Updated this week
- An End-to-End Evaluation Framework for Entity Resolution Systems☆36Dec 3, 2023Updated 2 years ago
- Continuous Benchmark of Filtering methods for Entity Resolution☆11Jul 20, 2025Updated 6 months ago
- Repository for in class material for Data Bootcamp☆13May 18, 2019Updated 6 years ago
- Probabilistic Record Linkage Using Pretrained Text Embeddings☆16Jan 23, 2026Updated 3 weeks ago
- 📰🗞 New York Times data☆12Aug 4, 2018Updated 7 years ago
- This repository contains code and extensive prompt examples to reproduce and extend the experiments in our papers "Using ChatGPT for Enti…☆65Oct 18, 2024Updated last year
- This repository aims to build a comprehensive literature review of the economics of open source software. Contributions welcome.☆12Apr 2, 2025Updated 10 months ago
- Specification Curve is a Python package that performs specification curve analysis: exploring how a coefficient varies under multiple dif…☆26Jan 29, 2026Updated 2 weeks ago
- ☆28Feb 9, 2026Updated last week
- pseudopeople is a Python package that generates realistic simulated data about a fictional United States population, designed for use in …☆24Jan 21, 2026Updated 3 weeks ago
- A fast TUI application (with optional webui) to visually navigate and inspect JSON and JSONL data. Easily localize parse errors in large …☆15Sep 30, 2024Updated last year
- a subset of sql dialect for clickhouse db.☆13Jan 9, 2023Updated 3 years ago
- Blocking records for record linkage and data deduplication based on ANN algorithms in Python.☆18Nov 28, 2025Updated 2 months ago
- An R package for blocking records for record linkage / data deduplication based on approximate nearest neighbours algorithms.☆14Feb 8, 2026Updated last week
- PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolut…☆161Nov 18, 2022Updated 3 years ago
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Mar 25, 2025Updated 10 months ago
- Implements several Markov chain Monte Carlo (MCMC) algorithms for the latent Dirichlet allocation (LDA) model☆11Feb 11, 2020Updated 6 years ago
- ☆18Updated this week
- ☆13Jan 10, 2023Updated 3 years ago
- ☆11Apr 2, 2021Updated 4 years ago
- ☆32Mar 31, 2023Updated 2 years ago
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆18Jun 24, 2024Updated last year
- blackmaRble: retrieve, wrangle and plot VIIRS Black Marble nighttimelight data in R☆17Dec 21, 2023Updated 2 years ago
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 4 months ago
- Repository for performing Blocking using Deep Learning based on the paper "Deep Learning for Blocking in Entity Matching: A Design Space …☆32Apr 5, 2023Updated 2 years ago
- ☆15Aug 11, 2022Updated 3 years ago
- The repository for PoliPrompt☆18Oct 20, 2024Updated last year
- ☆22Jul 15, 2024Updated last year
- PyTorch implementation for OCT-GAN Neural ODE-based Conditional Tabular GANs (WWW 2021)☆15Oct 10, 2022Updated 3 years ago
- Income Accounting☆17Feb 11, 2021Updated 5 years ago
- Biographical data of political candidates in India; rich data on Indian MPs☆12Jun 8, 2023Updated 2 years ago
- ☆19Jan 4, 2024Updated 2 years ago
- Implementation of the paper "Deep Indexed Active Learning for Matching Heterogeneous Entity Representations"☆17Dec 20, 2021Updated 4 years ago
- econometrics in pytorch☆28Feb 6, 2026Updated last week
- UI for JedAI Toolkit☆17May 20, 2022Updated 3 years ago
- Computational Text Analysis Workshop Materials☆36May 6, 2016Updated 9 years ago
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,045Feb 21, 2024Updated last year
- Similarity and distance measures for clustering and record linkage applications in R☆18Sep 23, 2025Updated 4 months ago