gpoulter / pydedupe
(Archived) A Python library for record linkage and deduplication.
☆19Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for pydedupe
- A simple command line interface to the datamade/dedupe library.☆42Updated last year
- Generate Pandas frames, load and extract data, based on JSON Table Schema descriptors.☆52Updated 3 years ago
- Search 'from' and 'to' strings to learn a text cleaning mapping☆17Updated 9 years ago
- Extract, parse and populate templates from strings☆27Updated 5 years ago
- A maximum-strength name parser for record linkage.☆32Updated 3 months ago
- Provide partial dates and retain the date precision through processing☆13Updated last year
- Utility library to turn country names into ISO two-letter codes☆66Updated 3 weeks ago
- Streaming newline delimited JSON I/O.☆12Updated last year
- Generate Elasticsearch indexes based on Table Schema descriptors.☆10Updated 3 years ago
- Python wrapper for a C++ Double Metaphone☆15Updated last year
- The core of sunlightlabs' Data Commons project. Includes the Transparency Data site and the APIs that power TransparencyData.com and Infl…☆38Updated 8 years ago
- Versioned domain model. Python library for revisioning/versioning of databases.☆44Updated 3 years ago
- A Python library for defining rule-based overrides on messy data☆12Updated 9 months ago
- CSV inspection☆10Updated last year
- An easy interface for documenting data packages☆19Updated 6 years ago
- Enhance your feature engineering workflow with Kodiak☆20Updated last year
- Python implementation of anonymous linkage using cryptographic linkage keys☆63Updated 5 months ago
- ☆13Updated 5 years ago
- A python module that will check for package updates.☆28Updated 3 years ago
- Python language parser for a tabular format for structured metadata. http://metatab.org☆17Updated last year
- CSV on the web☆37Updated 2 weeks ago
- Generate SQL tables, load and extract data, based on JSON Table Schema descriptors.☆61Updated last year
- A browser user interface for manual labeling of record pairs.☆41Updated last year
- Framework for processing data packages in pipelines of modular components.☆119Updated last year
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 9 years ago
- International Address formatter which considers the standard formatting rules of the country☆26Updated 3 years ago
- Transform flat data structures into nested object graphs matching JSON schema definitions.☆28Updated 8 years ago
- Detect and classify pagination links☆14Updated 4 years ago
- Self-Service Semantic Suite (S4)☆17Updated 8 years ago
- 🛠️ A library for mapping CKAN metadata <=> Frictionless metadata☆9Updated last year