Python package for deduplication/entity resolution using active learning
☆82Aug 24, 2024Updated last year
Alternatives and similar repositories for deduplipy
Users that are interested in deduplipy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Spark Monitoring☆13Feb 28, 2023Updated 3 years ago
- MinHash implementation in Python☆12Aug 24, 2024Updated last year
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Dec 18, 2022Updated 3 years ago
- Get a list of deduped files on a ZFS filesystem☆13Oct 14, 2020Updated 5 years ago
- ☆11Feb 26, 2021Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A Python tool to search for and remove duplicated files in messy datasets☆15Dec 23, 2024Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆23Jun 30, 2025Updated 9 months ago
- Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…☆22Nov 2, 2021Updated 4 years ago
- Doubt your data, find bad labels.☆516Jul 15, 2024Updated last year
- SHAP-based validation for linear and tree-based models. Applied to binary, multiclass and regression problems.☆152Apr 19, 2025Updated last year
- Makes it easy to use altair from FastHTML☆28Oct 9, 2024Updated last year
- Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge☆17Nov 16, 2021Updated 4 years ago
- Bag of, not words, but tricks!☆68Oct 31, 2023Updated 2 years ago
- Fast fuzzy text search☆12May 16, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- motivational website to do something special this month☆21Jan 11, 2024Updated 2 years ago
- A natural language date parser. (Python version of chrono.js)☆26May 31, 2025Updated 10 months ago
- A Polars plugin for encrypting and decrypting data using AES-GSM-CIV algorithm in Rust☆11Jan 8, 2025Updated last year
- It's a cooler way to store simple linear models.☆26Jul 15, 2024Updated last year
- Very basic solar regression☆16Updated this week
- Analyzing the tree of imports of running Python code.☆12Feb 17, 2023Updated 3 years ago
- Super Simple Similarities Service☆155Apr 11, 2025Updated last year
- Find near-duplicate documents using minhashing implemented in Go.☆16Dec 22, 2015Updated 10 years ago
- Fast, lightweight toy container system☆11Oct 18, 2020Updated 5 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Advanced data wrangling for python☆16Sep 5, 2023Updated 2 years ago
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,046Feb 21, 2024Updated 2 years ago
- ☆15Aug 3, 2021Updated 4 years ago
- A textual TUI for Prodigy☆16Jun 8, 2023Updated 2 years ago
- UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn.☆42Mar 5, 2026Updated last month
- A python library / model for creating co-references between AMR graph nodes.☆11Dec 11, 2022Updated 3 years ago
- A starter template for Svelte apps with TailwindCSS☆11Jan 7, 2023Updated 3 years ago
- Python Clustering☆10Jun 27, 2017Updated 8 years ago
- Python library and dashboard for hyperparameter search and model training for computer vision tasks based on PyTorch, Optuna, FiftyOne, D…☆17Jul 14, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Content Defined Chunking playground☆50Mar 26, 2026Updated 3 weeks ago
- A set of tools for Dynamic Design Patterns in Python☆11Oct 6, 2023Updated 2 years ago
- Scala embedded universal probabilistic programming language☆11Apr 15, 2021Updated 5 years ago
- An implementation of the beta distribution probability density function in Javascript. This implementation overcomes the problem of large…☆10Oct 29, 2019Updated 6 years ago
- pytest plugin for a better developer experience when working with the PyTorch test suite☆44Dec 13, 2021Updated 4 years ago
- A Go library implementing a buzhash rolling hash function☆31Aug 16, 2016Updated 9 years ago
- Extra blocks for scikit-learn pipelines.☆1,388Apr 2, 2026Updated 2 weeks ago