Python package for deduplication/entity resolution using active learning
☆82Aug 24, 2024Updated last year
Alternatives and similar repositories for deduplipy
Users that are interested in deduplipy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MinHash implementation in Python☆12Aug 24, 2024Updated last year
- Get a list of deduped files on a ZFS filesystem☆13Oct 14, 2020Updated 5 years ago
- ☆17Mar 17, 2022Updated 4 years ago
- ☆11Feb 26, 2021Updated 5 years ago
- A Python tool to search for and remove duplicated files in messy datasets☆15Dec 23, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆23Jun 30, 2025Updated 10 months ago
- Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…☆22Nov 2, 2021Updated 4 years ago
- Record matching and entity resolution at scale in Spark☆36Oct 31, 2023Updated 2 years ago
- Doubt your data, find bad labels.☆516Jul 15, 2024Updated last year
- Fast fuzzy text search☆12May 16, 2023Updated 2 years ago
- A Python library for creating adversarial splits☆14Jul 24, 2022Updated 3 years ago
- motivational website to do something special this month☆21Jan 11, 2024Updated 2 years ago
- Interactive math react components in jupyter☆11Jul 25, 2023Updated 2 years ago
- A natural language date parser. (Python version of chrono.js)☆26May 31, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems☆17May 25, 2020Updated 5 years ago
- A Polars plugin for encrypting and decrypting data using AES-GSM-CIV algorithm in Rust☆11Jan 8, 2025Updated last year
- It's a cooler way to store simple linear models.☆26Jul 15, 2024Updated last year
- Very basic solar regression☆16Updated this week
- Analyzing the tree of imports of running Python code.☆12Feb 17, 2023Updated 3 years ago
- Super Simple Similarities Service☆155Apr 11, 2025Updated last year
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,049Feb 21, 2024Updated 2 years ago
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,461Jul 29, 2025Updated 9 months ago
- ERPL is a DuckDB extension to connect to API based ecosystems via standard interfaces like OData, GraphQL and REST. This works e.g. for S…☆27May 2, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆15Aug 3, 2021Updated 4 years ago
- Code for the paper "Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond…☆24May 31, 2022Updated 3 years ago
- A python library / model for creating co-references between AMR graph nodes.☆11Dec 11, 2022Updated 3 years ago
- JS snippet to send codeblock contents as a query string☆51Apr 6, 2026Updated last month
- A set of tools for Dynamic Design Patterns in Python☆11Oct 6, 2023Updated 2 years ago
- Free and open source Tableau alternative that generates Python Pandas code☆12Aug 23, 2018Updated 7 years ago
- An implementation of the beta distribution probability density function in Javascript. This implementation overcomes the problem of large…☆10Oct 29, 2019Updated 6 years ago
- pytest plugin for a better developer experience when working with the PyTorch test suite☆44Dec 13, 2021Updated 4 years ago
- Fast duplicate file detection library☆26Jan 5, 2017Updated 9 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Missing data amputation and exploration functions for Python☆73Dec 17, 2022Updated 3 years ago
- Keep your configuration files in sync☆17Jan 6, 2026Updated 4 months ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆35May 24, 2024Updated last year
- Resources for PVLDB 2023 submission☆28Aug 28, 2024Updated last year
- trispace is laser cut maker kit that can be used to create just about anything geometric - sculptures, toys, buildings and prototypes.☆15Apr 26, 2016Updated 10 years ago
- A Python Natural Language Processing Toolkit for Electronic Health Record Texts☆13May 24, 2023Updated 2 years ago
- Dictionary of obscure words☆12Aug 15, 2023Updated 2 years ago