fritshermans / deduplipyView external linksLinks
Python package for deduplication/entity resolution using active learning
☆83Aug 24, 2024Updated last year
Alternatives and similar repositories for deduplipy
Users that are interested in deduplipy are comparing it to the libraries listed below
Sorting:
- Analyzing the tree of imports of running Python code.☆12Feb 17, 2023Updated 2 years ago
- Fast fuzzy text search☆11May 16, 2023Updated 2 years ago
- ☆15Aug 3, 2021Updated 4 years ago
- Keep your configuration files in sync☆16Jan 6, 2026Updated last month
- Find duplicate text files.☆15Jan 14, 2025Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆22Jun 30, 2025Updated 7 months ago
- Integration with (approximate) nearest neighbors libraries for scikit-learn + clustering based on with kNN-graphs.☆23Updated this week
- Learning BPE embeddings by first learning a segmentation model and then training word2vec☆19Dec 18, 2022Updated 3 years ago
- ☆17Mar 17, 2022Updated 3 years ago
- Applying Snorkel to SuperGLUE☆26Dec 16, 2019Updated 6 years ago
- motivational website to do something special this month☆21Jan 11, 2024Updated 2 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆19Aug 28, 2023Updated 2 years ago
- A module for solving linear programming problems on Python.☆19Mar 22, 2023Updated 2 years ago
- Omnipy is a high level Python library for type-driven data wrangling and scalable workflow orchestration (under development)☆25Feb 5, 2026Updated last week
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆66Feb 2, 2026Updated last week
- Makes it easy to use altair from FastHTML☆28Oct 9, 2024Updated last year
- Proofs of concept for workflows that augment Obsidian.md knowledge management via NLP analytics & modelling☆24Aug 16, 2022Updated 3 years ago
- It's a cooler way to store simple linear models.☆27Jul 15, 2024Updated last year
- A ninja python package that unifies the Google Earth Engine ecosystem.☆66Feb 9, 2026Updated last week
- Doubt your data, find bad labels.☆516Jul 15, 2024Updated last year
- Bag of, not words, but tricks!☆68Oct 31, 2023Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated last year
- OlliePy is a python package which can help data scientists in exploring their data and evaluating and analysing their machine learning ex…☆53Dec 10, 2023Updated 2 years ago
- Super Simple Similarities Service☆155Apr 11, 2025Updated 10 months ago
- SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batchi…☆35May 24, 2024Updated last year
- A lightweight implementation of shapes drawn across a geo-temporal plane.☆12Jan 27, 2026Updated 2 weeks ago
- Safitty is a wrapper on JSON/YAML configs for Python☆30Mar 19, 2020Updated 5 years ago
- edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries.…☆10Nov 14, 2021Updated 4 years ago
- Missing data amputation and exploration functions for Python☆72Dec 17, 2022Updated 3 years ago
- Stackable cache classes for sharing, encryption, statistics and more on top of cachetools, redis and memcached☆36Dec 14, 2025Updated 2 months ago
- Record matching and entity resolution at scale in Spark☆36Oct 31, 2023Updated 2 years ago
- Extra blocks for scikit-learn pipelines.☆1,377Updated this week
- Build dashboards in Jupyter Notebook with numeric and chart boxes☆216Jul 27, 2022Updated 3 years ago
- JS snippet to send codeblock contents as a query string☆51Jun 11, 2025Updated 8 months ago
- captures logs and makes cron more fun☆81Jan 31, 2026Updated 2 weeks ago
- Django app that builds `template` and `elements` components from the Government Digital Services style guide☆10Dec 20, 2018Updated 7 years ago
- SciCount is tool focused on counting and classifying of objects in image-like data and scientific images, with training and example datas…☆11Oct 24, 2023Updated 2 years ago
- ☆11Jul 3, 2020Updated 5 years ago
- Some microbenchmarks and design docs before commencement☆12Feb 1, 2021Updated 5 years ago