Python package for deduplication/entity resolution using active learning
☆83Aug 24, 2024Updated last year
Alternatives and similar repositories for deduplipy
Users that are interested in deduplipy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MinHash implementation in Python☆12Aug 24, 2024Updated last year
- ☆11Feb 26, 2021Updated 5 years ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆22Jun 30, 2025Updated 9 months ago
- Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…☆22Nov 2, 2021Updated 4 years ago
- Record matching and entity resolution at scale in Spark☆36Oct 31, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Doubt your data, find bad labels.☆516Jul 15, 2024Updated last year
- Makes it easy to use altair from FastHTML☆28Oct 9, 2024Updated last year
- Mixpost Installation with Docker Containers☆14Mar 15, 2023Updated 3 years ago
- Bag of, not words, but tricks!☆68Oct 31, 2023Updated 2 years ago
- A Python library for creating adversarial splits☆14Jul 24, 2022Updated 3 years ago
- motivational website to do something special this month☆21Jan 11, 2024Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- It's a cooler way to store simple linear models.☆27Jul 15, 2024Updated last year
- Analyzing the tree of imports of running Python code.☆12Feb 17, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn.☆40Mar 5, 2026Updated 3 weeks ago
- Super Simple Similarities Service☆155Apr 11, 2025Updated 11 months ago
- Advanced data wrangling for python☆17Sep 5, 2023Updated 2 years ago
- AI web parser library + CLI☆48May 5, 2025Updated 10 months ago
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,448Jul 29, 2025Updated 8 months ago
- ERPL is a DuckDB extension to connect to API based ecosystems via standard interfaces like OData, GraphQL and REST. This works e.g. for S…☆26Updated this week
- A textual TUI for Prodigy☆16Jun 8, 2023Updated 2 years ago
- Code for the paper "Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond…☆24May 31, 2022Updated 3 years ago
- Keep your configuration files in sync☆16Jan 6, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A python library / model for creating co-references between AMR graph nodes.☆11Dec 11, 2022Updated 3 years ago
- ☆17Nov 15, 2025Updated 4 months ago
- A starter template for Svelte apps with TailwindCSS☆11Jan 7, 2023Updated 3 years ago
- Python Clustering☆10Jun 27, 2017Updated 8 years ago
- JS snippet to send codeblock contents as a query string☆51Mar 18, 2026Updated last week
- Scala embedded universal probabilistic programming language☆11Apr 15, 2021Updated 4 years ago
- An implementation of the beta distribution probability density function in Javascript. This implementation overcomes the problem of large…☆10Oct 29, 2019Updated 6 years ago
- Extra blocks for scikit-learn pipelines.☆1,386Updated this week
- Missing data amputation and exploration functions for Python☆72Dec 17, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆35May 24, 2024Updated last year
- Resources for PVLDB 2023 submission☆27Aug 28, 2024Updated last year
- A simple wrapper to use Pandas Profiling easily in Kedro☆17Apr 12, 2021Updated 4 years ago
- A Python Natural Language Processing Toolkit for Electronic Health Record Texts☆13May 24, 2023Updated 2 years ago
- Dictionary of obscure words☆12Aug 15, 2023Updated 2 years ago
- End-to-End Neural Event Coreference Resolution☆11Jun 18, 2023Updated 2 years ago
- Just another sentiment wrapper.☆18Dec 11, 2021Updated 4 years ago