gpoulter / pydedupe
(Archived) A Python library for record linkage and deduplication.
☆19Updated 9 months ago
Alternatives and similar repositories for pydedupe:
Users that are interested in pydedupe are comparing it to the libraries listed below
- Generate Pandas frames, load and extract data, based on JSON Table Schema descriptors.☆52Updated 3 years ago
- A maximum-strength name parser for record linkage.☆36Updated 5 months ago
- Python wrapper for a C++ Double Metaphone☆15Updated last year
- Streaming newline delimited JSON I/O.☆12Updated last year
- Generate Elasticsearch indexes based on Table Schema descriptors.☆10Updated 3 years ago
- CSV on the web☆38Updated 2 months ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆15Updated last week
- Utility library to turn country names into ISO two-letter codes☆66Updated last month
- Provide partial dates and retain the date precision through processing☆13Updated 2 years ago
- A browser user interface for manual labeling of record pairs.☆42Updated last year
- Versioned domain model. Python library for revisioning/versioning of databases.☆44Updated 4 years ago
- Datasette plugin for modifying table schemas☆17Updated 4 months ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago
- python package for performing deduplication using flexible text matching and cleaning in pandas dataframe☆25Updated 4 years ago
- Generate SQL tables, load and extract data, based on JSON Table Schema descriptors.☆62Updated last year
- Extends zip() and itertools.zip_longest() to generate named tuples.☆23Updated 5 years ago
- agate-sql adds SQL read/write support to agate.☆19Updated 10 months ago
- Tool for running transformations on columns in a SQLite database☆30Updated 3 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Commons of stupid, simple Python micro functions. Pull requests very welcome.☆19Updated 2 years ago
- ☆13Updated 5 years ago
- Markdown -> IPython conversion tool☆15Updated 9 years ago
- Write Datasette canned queries as plain SQL files☆13Updated 2 years ago
- A python module that will check for package updates.☆28Updated 3 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆14Updated last year
- Asynchronous Server Side Events (SSE) client for Python 3☆23Updated 8 months ago
- Allows fast prototyping in Python for OpenCV☆19Updated 4 years ago
- Graph extraction and NLP analysis for Baleen Corpora☆18Updated 8 years ago
- Python context manager to communicate with a subprocess using iterables: for when data is too big to fit in memory and has to be streamed☆7Updated 3 months ago