Python package for deduplication/entity resolution using active learning
☆82Aug 24, 2024Updated last year
Alternatives and similar repositories for deduplipy
Users that are interested in deduplipy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Find duplicate text files.☆14Jan 14, 2025Updated last year
- MinHash implementation in Python☆12Aug 24, 2024Updated last year
- LLM Oracle is a GPT-4 powered tool for predicting future events. It's like a Magic 8 Ball that is able to perform basic research, calcula…☆17May 27, 2023Updated 3 years ago
- Deduplication for cfDNA sequencing data☆11Jul 5, 2017Updated 8 years ago
- Get a list of deduped files on a ZFS filesystem☆13Oct 14, 2020Updated 5 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆11Feb 26, 2021Updated 5 years ago
- ☆17Mar 17, 2022Updated 4 years ago
- A Python tool to search for and remove duplicated files in messy datasets☆15Dec 23, 2024Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆24Jun 30, 2025Updated last year
- tkinter desktop chat interface with OpenAI's gpt-3.5-turbo API☆11Apr 29, 2023Updated 3 years ago
- Doubt your data, find bad labels.☆515Jul 15, 2024Updated last year
- SHAP-based validation for linear and tree-based models. Applied to binary, multiclass and regression problems.☆153Apr 19, 2025Updated last year
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆11Aug 17, 2013Updated 12 years ago
- Makes it easy to use altair from FastHTML☆28Oct 9, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Mixpost Installation with Docker Containers☆15Mar 15, 2023Updated 3 years ago
- Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge☆17Nov 16, 2021Updated 4 years ago
- Just some FastHTML demos for safekeeps☆13Dec 10, 2024Updated last year
- Bag of, not words, but tricks!☆68Jun 11, 2026Updated 2 weeks ago
- motivational website to do something special this month☆21Jan 11, 2024Updated 2 years ago
- Pile Deduplication Code☆18May 15, 2023Updated 3 years ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems☆17May 25, 2020Updated 6 years ago
- It's a cooler way to store simple linear models.☆26Jul 15, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Very basic solar regression☆16Updated this week
- Analyzing the tree of imports of running Python code.☆12Feb 17, 2023Updated 3 years ago
- Super Simple Similarities Service☆154Apr 11, 2025Updated last year
- Find near-duplicate documents using minhashing implemented in Go.☆16Dec 22, 2015Updated 10 years ago
- LAGOS-AND: A Large Gold Standard Dataset for Scholarly Author Name Disambiguation☆11Dec 8, 2022Updated 3 years ago
- 🕹️ Group and deduplicate concurrent tasks☆31May 15, 2026Updated last month
- A powerful and modular toolkit for record linkage and duplicate detection in Python☆1,055Feb 21, 2024Updated 2 years ago
- ☆15Aug 3, 2021Updated 4 years ago
- A textual TUI for Prodigy☆16Jun 8, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for the paper "Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond…☆24May 31, 2022Updated 4 years ago
- UMAP in pure MLX for Apple Silicon. 30x faster than umap-learn.☆44Mar 5, 2026Updated 3 months ago
- A python library / model for creating co-references between AMR graph nodes.☆11Dec 11, 2022Updated 3 years ago
- A starter template for Svelte apps with TailwindCSS☆11Jan 7, 2023Updated 3 years ago
- JS snippet to send codeblock contents as a query string☆51Jun 10, 2026Updated 3 weeks ago
- This library provides all icons of React-icons wrapped for ReFlex framework☆10Jul 3, 2023Updated 2 years ago
- Free and open source Tableau alternative that generates Python Pandas code☆12Aug 23, 2018Updated 7 years ago